Table of Contents

Supp Material for Luminous Ostracod Vargula tsujii

The R Juypter Notebook serves as a comprehensive repository encompasing most of the scripts and figures relevant to the bioluminescent ostracod V.tsujii analyses outlined in the publication. This notebook includes the following analyses: QC steps, WGCNA, Network Visualization, Network Connectivity, Differential Gene Expression and GO enrichments.

Author: Lisa Yeter Mesrop

Load libraries

In [ ]:
#load libraries 
library(WGCNA)
library(tidyverse) 
library(edgeR)
library(matrixStats)
library(DESeq2)
library(dplyr)
library(readxl)
library(data.table)
library(ggplot2)
library(ggrepel)
library(repr)
library(topGO)
library(reshape2)
library(plyr)
library(scales)
library(readxl)
library(repr)
library(RColorBrewer)
library(pheatmap)
library(flashClust)
library(ggvenn)


#always use the following WGCNA functions 
options(stringsAsFactors = FALSE)
enableWGCNAThreads()
allowWGCNAThreads(nThreads = 22)

Load data

The gene expression matrix consists of 57 samples and 41,486 genes before quality control (QC).

In [4]:
#read in count matrix
merged_counts <- read.csv("merged_single_network.csv", header = TRUE, row.names=1,
                  stringsAsFactors = FALSE)
In [5]:
#create the metadata 
meta <- data.frame(row.names = colnames(merged_counts))
In [6]:
sample_name = c("upper_lip_1","upper_lip_1","upper_lip_1","upper_lip_1","upper_lip_1","upper_lip_1","upper_lip_1","upper_lip_1",
                                  "upper_lip_1","upper_lip_2", "upper_lip_2", "upper_lip_2", "upper_lip_2", "upper_lip_2", 
                                  "eye", "gut", "eye", "gut", "eye", "gut","eye", "gut","eye", "gut","A", "A", "A", "AIF", "AIF", "AIF",
                                  "AII", "AII", "AII", "AIII", "AIII", "AIII", "AIM", "AIM", "AIM", "AIV", "AIV", "AIV","AV", "AV", "AV", "C","C","C","E", "E", "E", "G", "H", "H", "M", "M","M")
In [7]:
meta$sample_name <- sample_name
In [8]:
meta$names <- rownames(meta)
In [9]:
rownames(meta) <- NULL
In [10]:
head(meta)
A data.frame: 6 × 2
sample_namenames
<chr><chr>
1upper_lip_1vtu10.counts.tab
2upper_lip_1vtu1.counts.tab
3upper_lip_1vtu2.counts.tab
4upper_lip_1vtu3.counts.tab
5upper_lip_1vtu4.counts.tab
6upper_lip_1vtu5.counts.tab

QC for downstream analyses

As a preliminary quality control (QC) measure for WGCNA analysis, the overall similarity between samples and transcripts with low counts was assessed, as these counts often introduce noise in co-expression analyses. A filter was applied to the expression matrix, removing transcripts with fewer than 5 counts in more than 3 samples, given that some sample types included a minimum of 3 biological replicates. The QC analyses utilized functions from the DESeq2 package (Love et al., 2014).

In [ ]:
#import count table, meta and sample_name objects into a DESeq2 object 
dds_count_table <- DESeqDataSetFromMatrix(countData = merged_counts, colData = meta, design = ~sample_name)

Barplot of counts for all samples

The counts of filtered reads were examined using a bar plot.

In [12]:
#the number of prefiltered counts for each sample 
colSums(assay(dds_count_table))
vtu10.counts.tab
881916
vtu1.counts.tab
953139
vtu2.counts.tab
429668
vtu3.counts.tab
882596
vtu4.counts.tab
887784
vtu5.counts.tab
717749
vtu7.counts.tab
1433938
vtu8.counts.tab
488033
vtu9.counts.tab
896901
Vt.1A.counts.tab
1826043
Vt.2A.counts.tab
1388482
Vt.3A.counts.tab
1483095
Vt.4A.counts.tab
1214013
Vt.5A.counts.tab
1636430
Vt.1B.counts.tab
798885
Vt.1C.counts.tab
1365090
Vt.2B.counts.tab
342949
Vt.2C.counts.tab
1927280
Vt.3B.counts.tab
363212
Vt.3C.counts.tab
2470404
Vt.4B.counts.tab
943558
Vt.4C.counts.tab
1623438
Vt.5B.counts.tab
794665
Vt.5C.counts.tab
2014035
A.1.counts.tab
995203
A.2.counts.tab
583498
A.3.counts.tab
683680
AIF.1.counts.tab
1331646
AIF.2.counts.tab
1253052
AIF.3.counts.tab
688206
AII.1.counts.tab
993596
AII.2.counts.tab
808877
AII.3.counts.tab
976551
AIII.1.counts.tab
916076
AIII.2.counts.tab
765084
AIII.3.counts.tab
686605
AIM.1.counts.tab
906177
AIM.2.counts.tab
748870
AIM.3.counts.tab
910231
AIV.1.counts.tab
594556
AIV.2.counts.tab
741113
AIV.3.counts.tab
660913
AV.1.counts.tab
76179
AV.2.counts.tab
58955
AV.4.counts.tab
132446
C.1.counts.tab
969927
C.2.counts.tab
763769
C.3.counts.tab
851898
E.1.counts.tab
1054740
E.2.counts.tab
1084075
E.3.counts.tab
831205
G.1.counts.tab
1210799
H.1.counts.tab
815551
H.2.counts.tab
782567
M1.1.counts.tab
102298
M2.1.counts.tab
118595
M3.1.counts.tab
48696
In [4]:
#function from the R package repr to visualize figures in Jupyter Notebook(optional)
options(repr.plot.width=10, repr.plot.height=10)
In [13]:
#extract sample names for barplot 
names <- dds_count_table$sample_name
In [14]:
#visualize prefiltered raw counts in a barplot  

librarySizes <- colSums(assay(dds_count_table))

par(mar=c(10,5,2,2))  
barplot(librarySizes, 
        names=names, 
        las=2, 
        cex.names=.5) +title(main = "Barplot of Count Distributions of Samples", line = -1, outer = TRUE)

Remove samples with low counts for WGCNA

Based on the bar plot count distribution in Section 3.1, the following six samples were removed due to very low counts: M1.1.counts.tab, M2.1.counts.tab, M3.1.counts.tab, AV.1.counts.tab, AV.2.counts.tab, AV.4.counts.tab.

In [15]:
#update the merged counts, meta and sample_name objects with the six samples removed 
merged_counts_subset <- subset(merged_counts, select= -c(M1.1.counts.tab, M2.1.counts.tab, M3.1.counts.tab, AV.1.counts.tab, AV.2.counts.tab, AV.4.counts.tab))
meta_subset <- data.frame(row.names = colnames(merged_counts_subset)) 
sample_name_subset = c("upper_lip","upper_lip","upper_lip","upper_lip","upper_lip","upper_lip","upper_lip","upper_lip",
                                  "upper_lip","upper_lip", "upper_lip", "upper_lip", "upper_lip", "upper_lip", 
                                  "eye", "gut", "eye", "gut", "eye", "gut","eye", "gut","eye", "gut","A", "A", "A", "AIF", "AIF", "AIF",
                                  "AII", "AII", "AII", "AIII", "AIII", "AIII", "AIM", "AIM", "AIM", "AIV", "AIV", "AIV", "C","C","C","E", "E", "E", "G", "H", "H")


meta_subset$sample_name_subset <- sample_name_subset
meta_subset$names_subset <- rownames(meta_subset)
rownames(meta_subset) <- NULL
In [43]:
nrow(meta_subset)
51

Perform variance-stabilizing transformation for WGCNA

Perform variance-stabilizing transformation using DEseq2 package (Love et al., 2014). The authors of WGCNA recommend variance-stabilizing transformation as a normalization method before conducting network analyses (Langfelder and Horvath, 2008).

In [ ]:
#import updated count table, meta and sample_name objects into a DESeq2 object 
dds_count_table_subset<- DESeqDataSetFromMatrix(countData = merged_counts_subset, colData = meta_subset, design = ~sample_name_subset)
In [18]:
#filter the count table 
dds_merged_table_prefiltered <- dds_count_table_subset[rowSums(counts(dds_count_table_subset) >=5) >=3,]
In [19]:
nrow(dds_merged_table_prefiltered)
31097
In [20]:
#run DESeq2 function with variance-stabilizing transformation 
dds_subset <- DESeq(dds_merged_table_prefiltered, betaPrior = FALSE, parallel = TRUE)
#perform a variance-stabilizing transformation
vsd_subset <- varianceStabilizingTransformation(dds_subset)

overflow-y: scroll;
    max-height: 500px;
estimating size factors

estimating dispersions

gene-wise dispersion estimates: 38 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 38 workers

-- replacing outliers and refitting for 7985 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing

Visualize transformed matrix with hierarchical clustering and PCA

In [21]:
#transpose the matrix 
sampleDists_subset <- dist(t(assay(vsd_subset)))
In [22]:
#perform heatmap
sampleDistMatrix <- as.matrix(sampleDists_subset)
rownames(sampleDistMatrix) <- paste(colData(dds_merged_table_prefiltered)$sample_name_subset) 
colnames(sampleDistMatrix) <- colData(dds_merged_table_prefiltered)$names_subset
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix,
          clustering_distance_rows=sampleDists_subset,
         clustering_distance_cols=sampleDists_subset,
         col=colors)
In [24]:
#perform PCA 

pcaData <- plotPCA(vsd_subset, intgroup="sample_name_subset", returnData=TRUE)
percentVar <- round(100 * attr(pcaData, "percentVar"))
ggplot(pcaData, aes(PC1, PC2, color=sample_name_subset, shape=sample_name_subset)) +
  geom_point(size=5) + 
labs(color = "Sample Types")+ labs(shape = "Sample Types")+
scale_shape_manual(values = c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16))+
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) + 
  coord_fixed()+theme(
    panel.grid.major = element_line(colour = "gray97", size = 1),
    panel.grid.minor = element_line(linetype = "dotted"),
    panel.background = element_rect(fill = NA),
    legend.key = element_rect(fill = "gray100"),
    axis.line = element_line(size = 0.5, linetype = "solid"),
   panel.border = element_rect(colour = "black", fill = NA, size = 1)
  ) + theme(
  axis.title.x = element_text(size = 16),  
  axis.title.y = element_text(size = 16), 
  legend.title = element_text(size = 16), 
  legend.text = element_text(size = 12)    
)

WGCNA

After completing the prefiltering and normalization steps above, the count matrix now includes 51 samples and 31,097 transcript and is ready for WGCNA. WGCNA (Weighted Gene Co-expression Network Analysis) identifies clusters (modules) of highly correlated genes by constructing a network based on pairwise correlations between gene expression profiles (Langfelder and Horvath, 2008). These modules often correspond to specific biological processes or pathways, indicating that the genes within a module may be part of the same regulatory process. By analyzing the connectivity of genes within each module, WGCNA also helps identify key drivers or hub genes, providing insights into gene regulation and the biological processes they govern. The following scripts are from the WGCNA package (Langfelder and Horvath, 2008).

Load data

In [25]:
#use the prefiltered and normalized count matrix from the DEseq2
vsd_subset_matrix <- assay(vsd_subset)
## many functions expect the matrix to be transposed
datExpr <- t(vsd_subset_matrix) 
## check rows/cols
nrow(datExpr)
ncol(datExpr)
51
31097
In [26]:
#ready to start WGCNA Analysis 

#run this to check if there are gene outliers
gsg=goodSamplesGenes(datExpr, verbose = 3)
gsg$allOK
 Flagging genes and samples with too many missing values...
  ..step 1
TRUE

Pick a soft-threshold power

In [27]:
# choose a set of soft-thresholding powers
powers = c(c(1:10), seq(from = 12, to=20, by=2))
# call the network topology analysis function
sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)
pickSoftThreshold: will use block size 1438.
 pickSoftThreshold: calculating connectivity for given powers...
   ..working on genes 1 through 1438 of 31097
   ..working on genes 1439 through 2876 of 31097
   ..working on genes 2877 through 4314 of 31097
   ..working on genes 4315 through 5752 of 31097
   ..working on genes 5753 through 7190 of 31097
   ..working on genes 7191 through 8628 of 31097
   ..working on genes 8629 through 10066 of 31097
   ..working on genes 10067 through 11504 of 31097
   ..working on genes 11505 through 12942 of 31097
   ..working on genes 12943 through 14380 of 31097
   ..working on genes 14381 through 15818 of 31097
   ..working on genes 15819 through 17256 of 31097
   ..working on genes 17257 through 18694 of 31097
   ..working on genes 18695 through 20132 of 31097
   ..working on genes 20133 through 21570 of 31097
   ..working on genes 21571 through 23008 of 31097
   ..working on genes 23009 through 24446 of 31097
   ..working on genes 24447 through 25884 of 31097
   ..working on genes 25885 through 27322 of 31097
   ..working on genes 27323 through 28760 of 31097
   ..working on genes 28761 through 30198 of 31097
   ..working on genes 30199 through 31097 of 31097
   Power SFT.R.sq  slope truncated.R.sq mean.k. median.k. max.k.
1      1    0.157 -0.658          0.818  6860.0  6.55e+03  12600
2      2    0.787 -1.170          0.915  2420.0  1.96e+03   6880
3      3    0.894 -1.170          0.885  1110.0  6.97e+02   4360
4      4    0.929 -1.120          0.912   617.0  2.78e+02   3180
5      5    0.943 -1.040          0.981   392.0  1.21e+02   2500
6      6    0.940 -1.010          0.991   275.0  5.58e+01   2150
7      7    0.943 -0.978          0.994   207.0  2.74e+01   1910
8      8    0.947 -0.957          0.985   164.0  1.41e+01   1740
9      9    0.948 -0.943          0.970   135.0  7.49e+00   1610
10    10    0.951 -0.929          0.960   114.0  4.12e+00   1500
11    12    0.950 -0.914          0.947    86.6  1.37e+00   1330
12    14    0.936 -0.916          0.919    68.8  4.98e-01   1200
13    16    0.922 -0.920          0.900    56.4  1.96e-01   1090
14    18    0.923 -0.923          0.902    47.3  8.15e-02    999
15    20    0.914 -0.930          0.891    40.3  3.58e-02    920
In [28]:
# plot the results:
cex1 = 0.9;
# Scale-free topology fit index as a function of the soft-thresholding power
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
     xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",
     main = paste("Scale independence"));
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
     labels=powers,cex=cex1,col="red");
# this line corresponds to using an R^2 cut-off of h
abline(h=0.90,col="red")
# Mean connectivity as a function of the soft-thresholding power
plot(sft$fitIndices[,1], sft$fitIndices[,5],
     xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
     main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")

Run co-expression similarity and adjacency and TOM

In [44]:
# co-expression similarity and adjacency using assigned softpower
softPower=8
adjacency = adjacency(datExpr, power = softPower)
In [45]:
# calculate the Topological Overlap Matrix (TOM)
# turn adjacency into topological overlap, i.e. translate the adjacency into 
# topological overlap matrix and calculate the corresponding dissimilarity 
TOM = TOMsimilarity(adjacency, TOMType = "signed", verbose = 5);
dissTOM = 1-TOM;
..connectivity..
..matrix multiplication (system BLAS)..
..normalization..
..done.

Cluster genes by TOM and merge modules with very similar expression profiles

In [46]:
# call the hierarchical clustering function
geneTree = hclust(as.dist(dissTOM), method = "average");
In [47]:
# plot the resulting clustering tree (dendrogram)
plot(geneTree, xlab="", sub="", main = "Gene clustering on TOM-based dissimilarity",
     labels = FALSE, hang = 0.04);
In [48]:
# set the min module size
minModuleSize = 30;
# module identification using dynamic tree cut:
dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM,
                deepSplit = 2, pamRespectsDendro = FALSE,
                minClusterSize = minModuleSize);
 ..cutHeight not given, setting it to 0.995  ===>  99% of the (truncated) height range in dendro.
 ..done.
In [49]:
table(dynamicMods)
dynamicColors = labels2colors(dynamicMods)
dynamicMods
    0     1     2     3     4     5     6     7     8     9    10    11    12 
 6310 10309  7139  2726  1603   944   890   332   289   188   178    82    56 
   13 
   51 
In [50]:
table(dynamicColors)
dynamicColors
      black        blue       brown       green greenyellow        grey 
        332        7139        2726         944          82        6310 
    magenta        pink      purple         red      salmon         tan 
        188         289         178         890          51          56 
  turquoise      yellow 
      10309        1603 
In [51]:
# plot the dendrogram and colors underneath
plotDendroAndColors(geneTree, dynamicColors, "Dynamic Tree Cut",
                    dendroLabels = FALSE, hang = 0.03,
                    addGuide = TRUE, guideHang = 0.05,
                    main = "Gene dendrogram and module colors")
In [52]:
# calculate eigengenes
MEList = moduleEigengenes(datExpr, colors = dynamicColors)
MEs = MEList$eigengenes
# calculate dissimilarity of module eigengenes
MEDiss = 1-cor(MEs);
# cluster module eigengenes
METree = hclust(as.dist(MEDiss), method = "average");
# plot the result
plot(METree, main = "Clustering of module eigengenes",
     xlab = "", sub = "")
MEDissThres = 0.25
# plot the cut line into the dendrogram
abline(h=MEDissThres, col = "red")
In [53]:
# call an automatic merging function
merge = mergeCloseModules(datExpr, dynamicColors, cutHeight = MEDissThres, verbose = 3)
# merged module colors
mergedColors = merge$colors;
# eigengenes of the new merged modules 
mergedMEs = merge$newMEs;
 mergeCloseModules: Merging modules whose distance is less than 0.25
   multiSetMEs: Calculating module MEs.
     Working on set 1 ...
     moduleEigengenes: Calculating 14 module eigengenes in given set.
   multiSetMEs: Calculating module MEs.
     Working on set 1 ...
     moduleEigengenes: Calculating 13 module eigengenes in given set.
   Calculating new MEs...
   multiSetMEs: Calculating module MEs.
     Working on set 1 ...
     moduleEigengenes: Calculating 13 module eigengenes in given set.
In [54]:
plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors),
                    c("Dynamic Tree Cut", "Merged dynamic"),
                    dendroLabels = FALSE, hang = 0.03,
                    addGuide = TRUE, guideHang = 0.05)

Identify the module that contains Vargula tsujii c-luciferase

The functionally demonstrated c-luciferase gene, along with some phylogenetically similar c-luciferase genes, are located in the red module, which we refer to as the Bioluminescent Co-Regulatory Network (BCN).

In [55]:
#determine which column number in the datExpr object that corresponds to c-luciferase
which(colnames(datExpr) == "NODE_10321_length_1972_cov_1770.41_g3092_i1")
389
In [56]:
#determine the color of the module with c-luciferase
dynamicColors[[389]]
'red'
In [57]:
# extract all modules as a table for downstream analyses 
SubGeneNames = colnames(datExpr)
for (color in dynamicColors){
  module=SubGeneNames[which(dynamicColors==color)]
  write.table(module, paste("module_",color, ".txt",sep=""), sep="\t", row.names=FALSE, col.names=FALSE,quote=FALSE)
}
In [58]:
# extract the red module of interest from the dynamicColors object 
SubGeneNames = colnames(datExpr)
red=as.data.frame(SubGeneNames[which(dynamicColors=="red")])
names(red)[1] <- "transcript_id"
In [59]:
head(red)
A data.frame: 6 × 1
transcript_id
<chr>
1NODE_100020_length_140_cov_8.6_g92800_i0
2NODE_10035_length_2011_cov_7.68814_g7072_i0
3NODE_1015_length_4889_cov_22.1146_g714_i0
4NODE_10218_length_1985_cov_18.5554_g7198_i0
5NODE_102839_length_135_cov_19.3_g95619_i0
6NODE_10321_length_1972_cov_1770.41_g3092_i1

Import the annotation for Vargula tsujii transcriptome

In [ ]:
#read in the Trinotate sheet for Vargula tsujii 
Trinotate_lym_subset <- read_excel("Trinotate_lym_subset.xlsx")

Identify other modules of interest and incorporate annotation information

In [66]:
#incorporate annotation for the red module (the BCN)
red_module_gene_annot <- setDT(Trinotate_lym_subset, key = 'transcript_id')[J(red)]
In [67]:
# red module (the BCN)
head(red_module_gene_annot)
A data.table: 6 × 16
#gene_idtranscript_idsprot_Top_BLASTX_hitRNAMMERprot_idprot_coordssprot_Top_BLASTP_hitPfamSignalPTmHMMeggnogKegggene_ontology_blastgene_ontology_pfamtranscriptpeptide
<chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr>
g92800NODE_100020_length_140_cov_8.6_g92800_i0 . .. . . . . . . . . . ..
g7072 NODE_10035_length_2011_cov_7.68814_g7072_i0. .. . . . . . . . . . ..
g714 NODE_1015_length_4889_cov_22.1146_g714_i0 ROA1_SCHAM^ROA1_SCHAM^Q:4777-4220,H:1-186^67.725%ID^E:7.26e-70^RecName: Full=Heterogeneous nuclear ribonucleoprotein A1, A2/B1 homolog;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Polyneoptera; Orthoptera; Caelifera; Acrididea; Acridomorpha; Acridoidea; Acrididae; Cyrtacanthacridinae; Schistocerca.NODE_1015_length_4889_cov_22.1146_g714_i0.p1 4016-4768[-]ROA1_SCHAM^ROA1_SCHAM^Q:1-183,H:4-186^68.817%ID^E:2.58e-86^RecName: Full=Heterogeneous nuclear ribonucleoprotein A1, A2/B1 homolog;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Polyneoptera; Orthoptera; Caelifera; Acrididea; Acridomorpha; Acridoidea; Acrididae; Cyrtacanthacridinae; SchistocercaPF00076.22^RRM_1^RNA recognition motif. (a.k.a. RRM, RBD, or RNP domain)^14-81^E:7.4e-17`PF00076.22^RRM_1^RNA recognition motif. (a.k.a. RRM, RBD, or RNP domain)^105-175^E:1.5e-17. . . . GO:0005634^cellular_component^nucleus`GO:0003729^molecular_function^mRNA binding`GO:0007281^biological_process^germ cell development GO:0003676^molecular_function^nucleic acid binding ..
g7198 NODE_10218_length_1985_cov_18.5554_g7198_i0WSD_ACIAD^WSD_ACIAD^Q:1368-265,H:62-446^21.76%ID^E:1.15e-12^RecName: Full=O-acyltransferase WSD;^Bacteria; Proteobacteria; Gammaproteobacteria; Pseudomonadales; Moraxellaceae; Acinetobacter .NODE_10218_length_1985_cov_18.5554_g7198_i0.p1256-1953[-] WSD_ACIAD^WSD_ACIAD^Q:196-563,H:62-446^22.005%ID^E:9.38e-13^RecName: Full=O-acyltransferase WSD;^Bacteria; Proteobacteria; Gammaproteobacteria; Pseudomonadales; Moraxellaceae; Acinetobacter PF06974.13^DUF1298^Protein of unknown function (DUF1298)^418-558^E:2.4e-27 . ExpAA=83.94^PredHel=4^Topology=i36-58o73-95i452-474o504-526iENOG410XS7A^Acyltransferase, ws dgat mgatKEGG:aci:ACIAD0832`KO:K00635GO:0102966^molecular_function^arachidoyl-CoA:1-dodecanol O-acyltransferase activity`GO:0004144^molecular_function^diacylglycerol O-acyltransferase activity`GO:0047196^molecular_function^long-chain-alcohol O-fatty-acyltransferase activity`GO:0006071^biological_process^glycerol metabolic process`GO:0019432^biological_process^triglyceride biosynthetic processGO:0004144^molecular_function^diacylglycerol O-acyltransferase activity..
g95619NODE_102839_length_135_cov_19.3_g95619_i0 . .. . . . . . . . . . ..
g3092 NODE_10321_length_1972_cov_1770.41_g3092_i1LUCI_VARHI^LUCI_VARHI^Q:106-1749,H:2-555^47.227%ID^E:0^RecName: Full=Luciferin 2-monooxygenase;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Crustacea; Oligostraca; Ostracoda; Myodocopa; Myodocopida; Cypridinoidea; Cypridinidae; Vargula .NODE_10321_length_1972_cov_1770.41_g3092_i1.p194-1752[+] LUCI_VARHI^LUCI_VARHI^Q:5-552,H:2-555^47.227%ID^E:0^RecName: Full=Luciferin 2-monooxygenase;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Crustacea; Oligostraca; Ostracoda; Myodocopa; Myodocopida; Cypridinoidea; Cypridinidae; Vargula PF00094.25^VWD^von Willebrand factor type D domain^80-234^E:2.7e-14`PF00094.25^VWD^von Willebrand factor type D domain^319-460^E:3.6e-21 sigP:1^18^0.906^YES. . . GO:0016831^molecular_function^carboxy-lyase activity`GO:0047712^molecular_function^Cypridina-luciferin 2-monooxygenase activity`GO:0008218^biological_process^bioluminescence . ..
In [7]:
#extract the module that is significantly correlated with the gut sample (from Section 4.8) from the dynamicColors object   
purple_gut=as.data.frame(SubGeneNames[which(dynamicColors=="purple")])
names(purple_gut)[1] <- "transcript_id"
purple_gut_gene_annot <- setDT(Trinotate_lym_subset, key = 'transcript_id')[J(purple_gut)]
In [31]:
#extract the module that is correlated with forced lum sample (from Section 4.8) from the dynamicColors object   
SubGeneNames = colnames(datExpr)
salmon_forced_lum=as.data.frame(SubGeneNames[which(dynamicColors=="salmon")])
names(salmon_forced_lum)[1] <- "transcript_id"

#match the annotation with each transcript 
salmon_forced_lum_module_gene_annot <- setDT(Trinotate_lym_subset, key = 'transcript_id')[J(salmon_forced_lum)]

Module to trait heatmap

Identify modules (network) that are significantly associated with samples. The module eigengene provides a representative measure of the gene expression patterns within a module, allowing correlation with these traits to determine the most significant associations (Langfelder and Horvath, 2008). This correlation can then be visualized in a module-to-trait heatmap to determine the most significant associations.The following scripts are from the WGCNA package (Langfelder and Horvath, 2008).

In [68]:
traitData <- read_excel("traits_all_tsujii_updated.xlsx")
In [69]:
traitData <- as.data.frame(traitData)
In [70]:
names(traitData)
  1. 'Samples'
  2. 'Samples_names'
  3. 'Bio_UpperLip'
  4. 'Gut'
  5. 'Eye'
  6. 'Instar_Stage_A'
  7. 'Instar_Stage_AI'
  8. 'Instar_Stage_AII'
  9. 'Instar_Stage_AIII'
  10. 'Instar_Stage_AIV'
  11. 'Adult_Morning_Timepoint'
  12. 'Adult_forced_biolum'
In [72]:
head(traitData)
A data.frame: 6 × 12
SamplesSamples_namesBio_UpperLipGutEyeInstar_Stage_AInstar_Stage_AIInstar_Stage_AIIInstar_Stage_AIIIInstar_Stage_AIVAdult_Morning_TimepointAdult_forced_biolum
<chr><chr><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
1A.1.counts.tab A 0001000000
2A.2.counts.tab A 0001000000
3A.3.counts.tab A 0001000000
4AIF.1.counts.tabAIF0000100000
5AIF.2.counts.tabAIF0000100000
6AIF.3.counts.tabAIF0000100000
In [73]:
allTraits = traitData
In [74]:
head(allTraits)
A data.frame: 6 × 12
SamplesSamples_namesBio_UpperLipGutEyeInstar_Stage_AInstar_Stage_AIInstar_Stage_AIIInstar_Stage_AIIIInstar_Stage_AIVAdult_Morning_TimepointAdult_forced_biolum
<chr><chr><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
1A.1.counts.tab A 0001000000
2A.2.counts.tab A 0001000000
3A.3.counts.tab A 0001000000
4AIF.1.counts.tabAIF0000100000
5AIF.2.counts.tabAIF0000100000
6AIF.3.counts.tabAIF0000100000
In [75]:
names(allTraits)
  1. 'Samples'
  2. 'Samples_names'
  3. 'Bio_UpperLip'
  4. 'Gut'
  5. 'Eye'
  6. 'Instar_Stage_A'
  7. 'Instar_Stage_AI'
  8. 'Instar_Stage_AII'
  9. 'Instar_Stage_AIII'
  10. 'Instar_Stage_AIV'
  11. 'Adult_Morning_Timepoint'
  12. 'Adult_forced_biolum'
In [76]:
All_samples = rownames(datExpr)
In [77]:
traitRows = match(All_samples, allTraits$Samples)
In [78]:
traitRows
  1. 28
  2. 29
  3. 30
  4. 31
  5. 32
  6. 33
  7. 34
  8. 35
  9. 36
  10. 37
  11. 38
  12. 39
  13. 40
  14. 41
  15. 42
  16. 43
  17. 44
  18. 45
  19. 46
  20. 47
  21. 48
  22. 49
  23. 50
  24. 51
  25. 1
  26. 2
  27. 3
  28. 4
  29. 5
  30. 6
  31. 7
  32. 8
  33. 9
  34. 10
  35. 11
  36. 12
  37. 13
  38. 14
  39. 15
  40. 16
  41. 17
  42. 18
  43. 19
  44. 20
  45. 21
  46. 22
  47. 23
  48. 24
  49. 25
  50. 26
  51. 27
In [79]:
datTraits = allTraits[traitRows, -1]
In [80]:
rownames(datTraits) = allTraits[traitRows, 1]
In [82]:
# double check that row names agree
table(rownames(datTraits)==rownames(datExpr))
TRUE 
  51 
In [85]:
datTraits$Samples_names <- NULL
In [86]:
head(datTraits)
A data.frame: 6 × 10
Bio_UpperLipGutEyeInstar_Stage_AInstar_Stage_AIInstar_Stage_AIIInstar_Stage_AIIIInstar_Stage_AIVAdult_Morning_TimepointAdult_forced_biolum
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
vtu10.counts.tab1000000000
vtu1.counts.tab1000000000
vtu2.counts.tab1000000000
vtu3.counts.tab1000000000
vtu4.counts.tab1000000000
vtu5.counts.tab1000000000
In [87]:
# sample network based on squared Euclidean distance
A=adjacency(t(datExpr),type="distance")
# this calculates the whole network connectivity
k=as.numeric(apply(A,2,sum))-1
# standardized connectivity
Z.k=scale(k)
In [88]:
# designate samples as outlying
# if their Z.k value is below the threshold
thresholdZ.k=-5 # often -2.5
# the color vector indicates outlyingness (red)
outlierColor=ifelse(Z.k<thresholdZ.k,"red","black")
# calculate the cluster tree using flahsClust or hclust
sampleTree = flashClust(as.dist(1-A), method = "average")
# convert traits to a color representation:
# where red indicates high values
traitColors=data.frame(numbers2colors(datTraits,signed=FALSE))
dimnames(traitColors)[[2]]=paste(names(datTraits),"C",sep="")
datColors=data.frame(outlierC=outlierColor,traitColors)
# plot the sample dendrogram and the colors underneath.
plotDendroAndColors(sampleTree,groupLabels=names(datColors),
colors=datColors,main="Sample dendrogram and trait heatmap")
In [90]:
# choose a module assignment
moduleColors_Vtsujii=dynamicColors
# define numbers of genes and samples
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
# recalculate MEs with color labels
MEs0 = moduleEigengenes(datExpr,moduleColors_Vtsujii)$eigengenes
MEsFemale = orderMEs(MEs0)
modTraitCor = cor(MEsFemale, datTraits, use = "p")
modTraitP = corPvalueStudent(modTraitCor, nSamples)
#color code each association by the correlation value; display correlations and their p-values
textMatrix = paste(signif(modTraitCor, 2), "\n(",
signif(modTraitP, 1), ")", sep = "")
dim(textMatrix) = dim(modTraitCor)
par(mar = c(8, 8.5, 3, 3))
# display the correlation values within a heatmap plot
labeledHeatmap(Matrix = modTraitCor, xLabels = names(datTraits),
yLabels = names(MEsFemale), ySymbols = names(MEsFemale),
colorLabels =FALSE,colors = blueWhiteRed(50),textMatrix=textMatrix,
setStdMargins = FALSE, cex.text = 0.5, zlim = c(-1,1),
main = paste("Module-trait relationships"))

Network visualization

The Bioluminescent Co-regulatory Network (BCN) has a total of 889 co-expressed transcripts making it challenging to visualize the entire network topology comprehensively. To address this, we used Cytoscape to visualize the network and we subsetted the network to only include genes that are signifcantly upregulated (uniquely) in the bioluminescent upper lip from Section 9.4.

Export BCN for Cytoscape

Export the network of interest into edge and node list files for Cytoscape (Shannon et al., 2003).

In [5]:
# select module of interest
module = "red"
# select module probes
genes = colnames(datExpr)
inModule = dynamicColors==module
modProbes = genes[inModule]
# select the corresponding Topological Overlap
modTOM = TOM_visnet_power8[inModule, inModule]
dimnames(modTOM) = list(modProbes, modProbes) 

# export the network into edge and node list files Cytoscape 
cyt = exportNetworkToCytoscape(modTOM,
edgeFile = paste("CytoscapeInput-edges-", paste(module, collapse="-"), ".txt", sep=""),
nodeFile = paste("CytoscapeInput-nodes-", paste(module, collapse="-"), ".txt", sep=""),
weighted = TRUE,
threshold = 0.02,
nodeNames = modProbes,
altNodeNames = modProbes,
nodeAttr = dynamicColors[inModule])
In [128]:
#determine how many of the significantly upregulated genes expressed in the bioluminescent upper lip (from Section 9.4) are found in the BCN 
BCN_DE_unique_Bio_UpperLip  <- red  %>%
  filter(transcript_id %in% unique_genes_bio_upper_lip_info_annot$transcript_id)

#add annotation 
BCN_DE_unique_Bio_UpperLip_annot <- left_join(BCN_DE_unique_Bio_UpperLip,Trinotate_lym_subset,by="transcript_id")

GO enrichment analyses for BCN

GO enrichment analyses was performed using topGO package (Alexa Rahnenfuhrer, 2024). The figures representing the GO enrichments presented here were utilized for analysis, but were not included in the main text of the publication. The GO enrichments were visualized with GO-Figure! package for publication (Reijnders and Waterhouse, 2021). GO-Figure! reference https://github.com/lmesrop/BCN_publication/tree/main/Go-Figure!.

Import GO annotations

In [91]:
#import the trinotate go sheet from the Trinotate output 
geneID2GO <- readMappings(file ="Trinotate_go_lym.txt")
geneNames <- names(geneID2GO)
In [92]:
#save the transcript ids of all the annotated genes under geneNames object 
geneNames<- as.character(Trinotate_lym_subset$transcript_id)
#save the transcript 
myInterestingGenes= as.character(red[,1])
In [94]:
#subset the genesNames by the transcript IDs in my red module 
geneList <- factor(as.integer(geneNames %in% myInterestingGenes))
names(geneList) <- geneNames
head(geneList)
NODE_100000_length_141_cov_1.05814_g92780_i0
0
NODE_100001_length_141_cov_1.03488_g92781_i0
0
NODE_100002_length_141_cov_0.325581_g92782_i0
0
NODE_100003_length_141_cov_0.0116279_g92783_i0
0
NODE_100004_length_141_cov_0_g92784_i0
0
NODE_100005_length_141_cov_0_g92785_i0
0
Levels:
  1. '0'
  2. '1'

Use topGO to identify enriched biological processes in the BCN

In [96]:
#run the topGO function. 
GOdata <- new("topGOdata", ontology = "BP", allGenes = geneList,
                    annot = annFUN.gene2GO, gene2GO = geneID2GO)

results_go <- runTest(GOdata, algorithm="weight01", statistic="Fisher")
Building most specific GOs .....

	( 12645 GO terms found. )


Build GO DAG topology ..........

	( 13427 GO terms and 31130 relations. )


Annotating nodes ...............

	( 15506 genes annotated to the GO terms. )


			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 2181 nontrivial nodes
		 parameters: 
			 test statistic: fisher


	 Level 16:	1 nodes to be scored	(0 eliminated genes)


	 Level 15:	6 nodes to be scored	(0 eliminated genes)


	 Level 14:	11 nodes to be scored	(6 eliminated genes)


	 Level 13:	30 nodes to be scored	(210 eliminated genes)


	 Level 12:	67 nodes to be scored	(559 eliminated genes)


	 Level 11:	109 nodes to be scored	(1809 eliminated genes)


	 Level 10:	171 nodes to be scored	(3564 eliminated genes)


	 Level 9:	243 nodes to be scored	(5410 eliminated genes)


	 Level 8:	267 nodes to be scored	(6796 eliminated genes)


	 Level 7:	350 nodes to be scored	(9466 eliminated genes)


	 Level 6:	362 nodes to be scored	(11754 eliminated genes)


	 Level 5:	293 nodes to be scored	(13123 eliminated genes)


	 Level 4:	169 nodes to be scored	(14324 eliminated genes)


	 Level 3:	82 nodes to be scored	(14897 eliminated genes)


	 Level 2:	19 nodes to be scored	(15205 eliminated genes)


	 Level 1:	1 nodes to be scored	(15491 eliminated genes)

In [97]:
#retrieve the GO enrichment 
goEnrichment   <- GenTable(GOdata, Fisher = results_go, orderBy = "Fisher", topNodes = 100, numChar=1000)
In [98]:
#graph the GO enrichment 
goEnrichment$Fisher <- as.numeric(goEnrichment$Fisher)
goEnrichment <- goEnrichment[goEnrichment$Fisher < 0.05,] 
goEnrichment <- goEnrichment[goEnrichment$Significant > 2,]#remove any singletons
In [99]:
goEnrichment
A data.frame: 39 × 6
GO.IDTermAnnotatedSignificantExpectedFisher
<chr><chr><int><int><dbl><dbl>
1GO:0001519peptide amidation 1912 0.222.400e-19
2GO:0032504multicellular organism reproduction 9692411.444.100e-14
3GO:0009620response to fungus 6012 0.714.100e-13
4GO:0044719regulation of imaginal disc-derived wing size 4911 0.589.000e-12
5GO:0006508proteolysis 15403318.171.300e-08
6GO:0007613memory 10911 1.296.400e-08
7GO:0055114oxidation-reduction process 10033411.847.000e-08
8GO:0030206chondroitin sulfate biosynthetic process 31 5 0.372.900e-05
10GO:2000098negative regulation of smooth muscle cell-matrix adhesion 24 4 0.281.700e-04
11GO:0060588negative regulation of lipoprotein lipid oxidation 24 4 0.281.700e-04
12GO:2000405negative regulation of T cell migration 24 4 0.281.700e-04
13GO:0018401peptidyl-proline hydroxylation to 4-hydroxy-L-proline 25 4 0.302.000e-04
14GO:0071638negative regulation of monocyte chemotactic protein-1 production 27 4 0.322.700e-04
15GO:1900016negative regulation of cytokine production involved in inflammatory response 28 4 0.333.100e-04
16GO:0010642negative regulation of platelet-derived growth factor receptor signaling pathway 28 4 0.333.100e-04
17GO:0048662negative regulation of smooth muscle cell proliferation 31 4 0.374.600e-04
18GO:0014012peripheral nervous system axon regeneration 31 4 0.374.600e-04
19GO:0051895negative regulation of focal adhesion assembly 35 4 0.417.400e-04
20GO:0006869lipid transport 34613 4.088.000e-04
21GO:0042308negative regulation of protein import into nucleus 43 4 0.511.620e-03
23GO:0051592response to calcium ion 78 5 0.922.280e-03
24GO:0016540protein autoprocessing 23 3 0.272.410e-03
29GO:0048678response to axon injury 70 6 0.836.860e-03
30GO:0055085transmembrane transport 10502012.396.870e-03
31GO:0072347response to anesthetic 61 5 0.726.920e-03
32GO:0070555response to interleukin-1 61 3 0.727.070e-03
33GO:0042246tissue regeneration 67 4 0.798.070e-03
43GO:0016192vesicle-mediated transport 13401615.811.268e-02
47GO:0007585respiratory gaseous exchange 25 3 0.301.661e-02
49GO:0030512negative regulation of transforming growth factor beta receptor signaling pathway 46 3 0.541.693e-02
50GO:0008218bioluminescence 46 3 0.541.693e-02
51GO:0016051carbohydrate biosynthetic process 215 6 2.541.766e-02
54GO:0009058biosynthetic process 57433267.782.148e-02
66GO:0006749glutathione metabolic process 104 5 1.232.479e-02
68GO:0007274neuromuscular synaptic transmission 58 3 0.683.107e-02
80GO:0042391regulation of membrane potential 157 7 1.853.831e-02
83GO:1903533regulation of protein targeting 138 5 1.634.085e-02
85GO:0010043response to zinc ion 41 3 0.484.528e-02
86GO:0009749response to glucose 103 3 1.224.561e-02
In [100]:
#extract transcript ids that are significantly enriched in the BCN
myterms =goEnrichment$GO.ID 
mygenes = genesInTerm(GOdata, myterms)
In [101]:
#extract the transcript ids for each GO term
var=c()
for (i in 1:length(myterms))
{
   myterm <- myterms[i]
   mygenesforterm <- mygenes[myterm][[1]]
   myfactor <- mygenesforterm %in% myInterestingGenes
   mygenesforterm2 <- mygenesforterm[myfactor == TRUE]
   mygenesforterm2 <- paste(mygenesforterm2, collapse=',')
   var[i]=paste(myterm,"genes:",mygenesforterm2)
}

#write.table(var, "BCN_GO_to_mapping.txt", sep="\t", quote=F)
In [ ]:
#removed GO enrichment terms that have the same exact set of genes 
goEnrichment_subsetted <- read_excel("GO_enrichment_GO_figure_subsetted.xlsx")
In [113]:
#plot the GO enrichment results 
ntop = 25
ggdata <- goEnrichment[1:ntop,]
ggdata$Term <- factor(ggdata$Term, levels = rev(ggdata$Term)) 
plot_BCN_GO_BP <- ggplot(ggdata,
  aes(x = Term, y = -log10(Fisher), size = Significant, fill = -log10(Fisher))) +
 ylim(-1,21) + 
  geom_point(shape = 21) +
  scale_size(range = c(2.5,12.5)) +
  scale_fill_continuous(low = 'royalblue', high = 'red4') +
labs(
    title = 'GO enrichment - BP - BCN')+
  theme_bw(base_size = 20) +
labs(size= "Number of Genes")+
  theme(
    legend.position = 'right',
    legend.background = element_rect(),
    plot.title = element_text(angle = 0, size = 20, face = 'bold', vjust = 1),
    plot.subtitle = element_text(angle = 0, size = 10, face = 'bold', vjust = 1),
    plot.caption = element_text(angle = 0, size = 12, face = 'bold', vjust = 1),
    axis.text.x = element_blank(),
    axis.text.y = element_text(angle = 0, size = 13, face = 'bold', vjust = 0.5),
    axis.title = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_text(size = 8, face = 'bold'),
    axis.line = element_line(colour = 'black'),

    #Legend
    legend.key = element_blank(), 
    legend.text = element_text(size = 16, face = "bold"))+


  coord_flip()
In [114]:
plot_BCN_GO_BP + labs(x = NULL)
In [108]:
#options(repr.plot.width=14, repr.plot.height=8, repr.plot.res = 500)

Use topGO to identify enriched molecular functions in BCN

In [37]:
#run the topGO function
GOdata_MF <- new("topGOdata", ontology = "MF", allGenes = geneList,
                    annot = annFUN.gene2GO, gene2GO = geneID2GO)

results_go_MF <- runTest(GOdata_MF, algorithm="weight01", statistic="Fisher")
Building most specific GOs .....

	( 3583 GO terms found. )


Build GO DAG topology ..........

	( 3618 GO terms and 4733 relations. )


Annotating nodes ...............

	( 16040 genes annotated to the GO terms. )


			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 525 nontrivial nodes
		 parameters: 
			 test statistic: fisher


	 Level 13:	1 nodes to be scored	(0 eliminated genes)


	 Level 12:	8 nodes to be scored	(0 eliminated genes)


	 Level 11:	8 nodes to be scored	(35 eliminated genes)


	 Level 10:	13 nodes to be scored	(141 eliminated genes)


	 Level 9:	24 nodes to be scored	(260 eliminated genes)


	 Level 8:	33 nodes to be scored	(405 eliminated genes)


	 Level 7:	64 nodes to be scored	(3062 eliminated genes)


	 Level 6:	107 nodes to be scored	(3867 eliminated genes)


	 Level 5:	104 nodes to be scored	(6554 eliminated genes)


	 Level 4:	109 nodes to be scored	(9256 eliminated genes)


	 Level 3:	43 nodes to be scored	(13624 eliminated genes)


	 Level 2:	10 nodes to be scored	(14595 eliminated genes)


	 Level 1:	1 nodes to be scored	(15936 eliminated genes)

In [38]:
#retrieve the GO enrichment 
goEnrichment_MF   <- GenTable(GOdata_MF, Fisher = results_go_MF, orderBy = "Fisher", topNodes = 100, numChar=1000)
In [39]:
goEnrichment_MF$Fisher <- as.numeric(goEnrichment_MF$Fisher)
goEnrichment_MF <- goEnrichment_MF[goEnrichment_MF$Fisher < 0.05,] 
goEnrichment_MF <- goEnrichment_MF[goEnrichment_MF$Significant > 2,]
goEnrichment_MF
A data.frame: 23 × 6
GO.IDTermAnnotatedSignificantExpectedFisher
<chr><chr><int><int><dbl><dbl>
1GO:0004598peptidylamidoglycolate lyase activity 20120.241.400e-19
2GO:0004504peptidylglycine monooxygenase activity 20120.241.400e-19
3GO:0005507copper ion binding 65120.792.900e-12
4GO:0004181metallocarboxypeptidase activity 36 70.447.800e-08
5GO:0004222metalloendopeptidase activity 118 81.443.300e-05
6GO:0005509calcium ion binding 640197.824.200e-05
7GO:0046422violaxanthin de-epoxidase activity 10 30.121.300e-04
8GO:0050659N-acetylgalactosamine 4-sulfate 6-O-sulfotransferase activity 12 30.152.300e-04
9GO:0001537N-acetylgalactosamine 4-O-sulfotransferase activity 16 30.205.700e-04
11GO:0004252serine-type endopeptidase activity 241 92.949.900e-04
13GO:0018833DDT-dehydrochlorinase activity 26 30.322.440e-03
15GO:0004656procollagen-proline 4-dioxygenase activity 28 30.343.020e-03
18GO:0015485cholesterol binding 73 40.897.140e-03
19GO:0031418L-ascorbic acid binding 41 30.508.910e-03
32GO:0016757transferase activity, transferring glycosyl groups 341 54.161.045e-02
33GO:0047712Cypridina-luciferin 2-monooxygenase activity 46 30.561.222e-02
35GO:0051287NAD binding 94 41.151.692e-02
36GO:0016702oxidoreductase activity, acting on single donors with incorporation of molecular oxygen, incorporation of two atoms of oxygen 63 50.771.741e-02
37GO:0004867serine-type endopeptidase inhibitor activity 95 41.161.752e-02
46GO:0004364glutathione transferase activity 61 30.742.587e-02
58GO:0051119sugar transmembrane transporter activity 66 30.813.168e-02
59GO:0004180carboxypeptidase activity 88101.073.185e-02
60GO:0004601peroxidase activity 105 41.284.051e-02
In [40]:
#plot GO MF enchriment plot 
ntop = 23
ggdata <- goEnrichment_MF[1:ntop,]
ggdata$Term <- factor(ggdata$Term, levels = rev(ggdata$Term)) 
GO_MF_BCN_plot <- ggplot(ggdata,
  aes(x = Term, y = -log10(Fisher), size = Significant, fill = -log10(Fisher))) +

  ylim(-1,21) +
  geom_point(shape = 21) +
  scale_size(range = c(2.5,12.5)) +
  scale_fill_continuous(low = 'royalblue', high = 'red4') +

  labs(
    title = 'GO Analysis - MF - BCN')+
   theme_bw(base_size = 20) +
labs(size= "Number of Genes")+
  theme(
    legend.position = 'right',
    legend.background = element_rect(),
    plot.title = element_text(angle = 0, size = 20, face = 'bold', vjust = 1),
    plot.subtitle = element_text(angle = 0, size = 10, face = 'bold', vjust = 1),
    plot.caption = element_text(angle = 0, size = 12, face = 'bold', vjust = 1),
    axis.text.x = element_blank(),
    axis.text.y = element_text(angle = 0, size = 13, face = 'bold', vjust = 0.5),
    axis.title = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_text(size = 8, face = 'bold'),
    axis.line = element_line(colour = 'black'),

    #Legend
    legend.key = element_blank(), 
    legend.text = element_text(size = 16, face = "bold"))+


  coord_flip()
In [47]:
GO_MF_BCN_plot + labs(x=NULL)
In [46]:
#options(repr.plot.width=18, repr.plot.height=8, repr.plot.res = 500)

BCN network connectivity

WGCNA identifies genes integral to a (module) network using a network characteristic called module membership (Langfelder and Horvath, 2008). Module membership represents connectivity of a gene with other genes within a module and is used to define centralised hub genes (Langfelder and Horvath, 2008). Module membership (MM) is a measure ranging from 0 to 1, with higher values indicating strong connectivity within a module and lower values indicating weak connectivity (Langfelder and Horvath, 2008). Genes with similar expression patterns within a module are not only correlated with the module's overall expression (MM) but also tightly interconnected with other genes in the same module (IC) (Langfelder and Horvath, 2008). In other words, genes that are strongly correlated with the overall expression pattern of their modules (high MM) are also highly connected to other genes within those modules (high IC). There is a strong correlation between intramodule connectivity and module membership (Langfelder and Horvath, 2008).

Determine connectivity between intramodular connectivity and module membership

In [115]:
#select the same power used for WGCNA analysis in Section 4.3
ADJ1=abs(cor(datExpr,use="p"))^8
Alldegrees1=intramodularConnectivity(ADJ1, dynamicColors)
head(Alldegrees1)
A data.frame: 6 × 4
kTotalkWithinkOutkDiff
<dbl><dbl><dbl><dbl>
NODE_10000_length_2017_cov_3.60449_g7052_i01553.03050731503.634825149.39568221454.23914297
NODE_100017_length_140_cov_9.63529_g92797_i0 11.1935631 3.9747293 7.2188338 -3.24410443
NODE_10001_length_2016_cov_11.7874_g7053_i01409.75788131361.294200348.46368101312.83051935
NODE_100020_length_140_cov_8.6_g92800_i0 3.6384341 0.7672649 2.8711692 -2.10390425
NODE_100025_length_140_cov_5.88235_g92805_i0 5.9945145 1.4985511 4.4959634 -2.99741229
NODE_100026_length_140_cov_5.69412_g92806_i0 0.9903981 0.5349826 0.4554154 0.07956718
In [116]:
datME=moduleEigengenes(datExpr,dynamicColors)$eigengenes
signif(cor(datME, use="p"), 2)
A matrix: 14 × 14 of type dbl
MEblackMEblueMEbrownMEgreenMEgreenyellowMEgreyMEmagentaMEpinkMEpurpleMEredMEsalmonMEtanMEturquoiseMEyellow
MEblack 1.000-0.210-0.36-0.360-0.0430 0.1100-0.120-0.370-0.4500 0.7000-0.058 0.210 0.057-0.17
MEblue-0.210 1.000-0.26-0.130-0.0710-0.0890-0.069-0.310 0.4200-0.2800-0.150-0.160-0.680-0.35
MEbrown-0.360-0.260 1.00 0.660 0.3000 0.5200 0.410 0.450 0.4900-0.3400 0.220 0.380 0.670 0.76
MEgreen-0.360-0.130 0.66 1.000 0.3000 0.2900-0.045 0.300 0.2600-0.1600 0.540 0.230 0.460 0.75
MEgreenyellow-0.043-0.071 0.30 0.300 1.0000 0.0130 0.130 0.150-0.0025 0.1200 0.300 0.120 0.280 0.41
MEgrey 0.110-0.089 0.52 0.290 0.0130 1.0000 0.150 0.420 0.3000-0.0095-0.260 0.240 0.290 0.47
MEmagenta-0.120-0.069 0.41-0.045 0.1300 0.1500 1.000 0.270 0.1600-0.0810-0.190 0.027 0.300 0.20
MEpink-0.370-0.310 0.45 0.300 0.1500 0.4200 0.270 1.000-0.0650-0.1800-0.082 0.091 0.430 0.34
MEpurple-0.450 0.420 0.49 0.260-0.0025 0.3000 0.160-0.065 1.0000-0.5700-0.027 0.120-0.280 0.19
MEred 0.700-0.280-0.34-0.160 0.1200-0.0095-0.081-0.180-0.5700 1.0000 0.110 0.160 0.170 0.12
MEsalmon-0.058-0.150 0.22 0.540 0.3000-0.2600-0.190-0.082-0.0270 0.1100 1.000 0.170 0.280 0.44
MEtan 0.210-0.160 0.38 0.230 0.1200 0.2400 0.027 0.091 0.1200 0.1600 0.170 1.000 0.340 0.27
MEturquoise 0.057-0.680 0.67 0.460 0.2800 0.2900 0.300 0.430-0.2800 0.1700 0.280 0.340 1.000 0.69
MEyellow-0.170-0.350 0.76 0.750 0.4100 0.4700 0.200 0.340 0.1900 0.1200 0.440 0.270 0.690 1.00
In [117]:
datKME =signedKME(datExpr, datME, outputColumnName="MM.")
head(datKME)
A data.frame: 6 × 14
MM.blackMM.blueMM.brownMM.greenMM.greenyellowMM.greyMM.magentaMM.pinkMM.purpleMM.redMM.salmonMM.tanMM.turquoiseMM.yellow
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10000_length_2017_cov_3.60449_g7052_i0-0.15532029 0.97601523-0.2846444-0.144879714-0.039132553-0.1449490-0.0693439986-0.34229485 0.33625103-0.26151146-0.11164188-0.169304546-0.6317741-0.379896029
NODE_100017_length_140_cov_9.63529_g92797_i0-0.26226476-0.11715241 0.5758819 0.571758093 0.235266897 0.2928844 0.2376183428 0.31743155 0.30416925-0.07433324 0.28630390 0.240246230 0.3381483 0.572010981
NODE_10001_length_2016_cov_11.7874_g7053_i0-0.22970154 0.95923111-0.2430134-0.118696037-0.051335797-0.1268765-0.0442088696-0.30620975 0.37519349-0.29075774-0.09681686-0.111470875-0.6219299-0.362494605
NODE_100020_length_140_cov_8.6_g92800_i0 0.23556515 0.08400336-0.3229051-0.283185503 0.025327072-0.1555593-0.0005149578-0.44191396-0.16374555 0.47400625-0.11875783-0.142910689-0.1472292 0.002144837
NODE_100025_length_140_cov_5.88235_g92805_i0 0.52698437-0.29706323 0.1786959 0.062392534 0.007372184 0.2959813 0.0215938414-0.08060444 0.00623096 0.28659794 0.11489752 0.006724937 0.2851930 0.300323691
NODE_100026_length_140_cov_5.69412_g92806_i0 0.02933126-0.02126167 0.1233537 0.008712608-0.310629963 0.3781600-0.0309052488-0.17177320 0.26500061-0.08286882-0.16321921-0.137215951-0.0109511 0.077314310
In [118]:
nrow(datKME)
31097
In [18]:
#determine the module membership vs intramodular connectivity for the red module (the BCN)
which.color="red" 
restrictGenes=dynamicColors==which.color
verboseScatterplot(Alldegrees1$kWithin[ restrictGenes],
(datKME[restrictGenes, paste("MM.", which.color, sep="")])^10,
col=which.color,
xlab="Intramodular Connectivity",
ylab="(Module Membership)^8")
In [17]:
#function from the R package repr to visualize figures in Jupyter Notebook(optional)
options(repr.plot.width=8, repr.plot.height=7)

Identify genes in the BCN with highest intramodular connectivity

Identify all co-expressed transcripts within the BCN (red module) that have a high module membership (MM > 0.8). An MM value of 0.8 or higher indicates a strong association with the module eigengene, suggesting that the gene plays a central role in the module's network and can be considered a hub gene (Langfelder and Horvath, 2008).

In [120]:
combined_datKME_ADJ1 = merge(datKME, Alldegrees1, by=0)
In [121]:
head(combined_datKME_ADJ1)
A data.frame: 6 × 19
Row.namesMM.blackMM.blueMM.brownMM.greenMM.greenyellowMM.greyMM.magentaMM.pinkMM.purpleMM.redMM.salmonMM.tanMM.turquoiseMM.yellowkTotalkWithinkOutkDiff
<I<chr>><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
1NODE_1_length_23329_cov_11.1744_g0_i0 0.22666782 0.04040074-0.19187472-0.05986384-0.14137351-0.15607518-0.15311363-0.4167876 0.05360364 0.33473461 0.25271613 0.1551101-0.1389397 0.005845331 4.438268 1.473735 2.964533 -1.490798
2NODE_10_length_13445_cov_1.1233_g7_i0 0.19910742-0.42991626 0.07236738-0.03095044 0.02067577 0.18458297 0.17627597 0.3534248-0.48570092 0.31624714 0.11926391 0.1700352 0.4722523 0.154050407 16.797557 11.049569 5.747989 5.301580
3NODE_100_length_8069_cov_3.11705_g65_i0 -0.09566595 0.58874615-0.30267560-0.11441082-0.16633844-0.05056483-0.13963583-0.5034320 0.46145430-0.14571793-0.20310799-0.1772821-0.7030341-0.279742188 186.572675 154.12467932.447996 121.676683
4NODE_10000_length_2017_cov_3.60449_g7052_i0 -0.15532029 0.97601523-0.28464441-0.14487971-0.03913255-0.14494902-0.06934400-0.3422948 0.33625103-0.26151146-0.11164188-0.1693045-0.6317741-0.3798960291553.0305071503.63482549.3956821454.239143
5NODE_10001_length_2016_cov_11.7874_g7053_i0 -0.22970154 0.95923111-0.24301340-0.11869604-0.05133580-0.12687646-0.04420887-0.3062098 0.37519349-0.29075774-0.09681686-0.1114709-0.6219299-0.3624946051409.7578811361.29420048.4636811312.830519
6NODE_100017_length_140_cov_9.63529_g92797_i0-0.26226476-0.11715241 0.57588185 0.57175809 0.23526690 0.29288445 0.23761834 0.3174316 0.30416925-0.07433324 0.28630390 0.2402462 0.3381483 0.572010981 11.193563 3.974729 7.218834 -3.244104
In [122]:
nrow(combined_datKME_ADJ1)
31097
In [123]:
subset_combined_datKME_ADJ1 = dplyr::select(combined_datKME_ADJ1, c("Row.names","MM.red", "kWithin","kTotal"))
In [124]:
head(subset_combined_datKME_ADJ1)
A data.frame: 6 × 4
Row.namesMM.redkWithinkTotal
<I<chr>><dbl><dbl><dbl>
1NODE_1_length_23329_cov_11.1744_g0_i0 0.33473461 1.473735 4.438268
2NODE_10_length_13445_cov_1.1233_g7_i0 0.31624714 11.049569 16.797557
3NODE_100_length_8069_cov_3.11705_g65_i0 -0.14571793 154.124679 186.572675
4NODE_10000_length_2017_cov_3.60449_g7052_i0 -0.261511461503.6348251553.030507
5NODE_10001_length_2016_cov_11.7874_g7053_i0 -0.290757741361.2942001409.757881
6NODE_100017_length_140_cov_9.63529_g92797_i0-0.07433324 3.974729 11.193563
In [125]:
nrow(subset_combined_datKME_ADJ1)
31097
In [126]:
#make the transcript ids rownames again
row.names(subset_combined_datKME_ADJ1) <- subset_combined_datKME_ADJ1$Row.names
subset_combined_datKME_ADJ1[1] <- NULL
In [127]:
head(subset_combined_datKME_ADJ1)
A data.frame: 6 × 3
MM.redkWithinkTotal
<dbl><dbl><dbl>
NODE_1_length_23329_cov_11.1744_g0_i0 0.33473461 1.473735 4.438268
NODE_10_length_13445_cov_1.1233_g7_i0 0.31624714 11.049569 16.797557
NODE_100_length_8069_cov_3.11705_g65_i0-0.14571793 154.124679 186.572675
NODE_10000_length_2017_cov_3.60449_g7052_i0-0.261511461503.6348251553.030507
NODE_10001_length_2016_cov_11.7874_g7053_i0-0.290757741361.2942001409.757881
NODE_100017_length_140_cov_9.63529_g92797_i0-0.07433324 3.974729 11.193563
In [128]:
#subset the subset_combined_datKME_ADJ1 to include just transcripts from the red module (the BCN)
top_datKME_ADJ1_.8 <-subset_combined_datKME_ADJ1 %>% dplyr::filter(subset_combined_datKME_ADJ1$MM.red >= 0.79)
In [129]:
head(top_datKME_ADJ1_.8)
A data.frame: 6 × 3
MM.redkWithinkTotal
<dbl><dbl><dbl>
NODE_10321_length_1972_cov_1770.41_g3092_i10.801098022.1819631.65077
NODE_10505_length_1950_cov_293.099_g7393_i00.912165147.4948855.68645
NODE_10519_length_1949_cov_2.1246_g7404_i00.904365849.0630952.70805
NODE_106809_length_128_cov_1412.44_g99589_i00.920736081.5474589.02335
NODE_10748_length_1916_cov_1427.62_g3092_i40.790717720.0418828.97377
NODE_10872_length_1900_cov_8.46721_g7639_i00.803511920.6915026.06230
In [130]:
#make rowname a column for downstream subsetting 
top_datKME_ADJ1_.8$transcript_id <- rownames(top_datKME_ADJ1_.8)
In [131]:
head(top_datKME_ADJ1_.8)
A data.frame: 6 × 4
MM.redkWithinkTotaltranscript_id
<dbl><dbl><dbl><chr>
NODE_10321_length_1972_cov_1770.41_g3092_i10.801098022.1819631.65077NODE_10321_length_1972_cov_1770.41_g3092_i1
NODE_10505_length_1950_cov_293.099_g7393_i00.912165147.4948855.68645NODE_10505_length_1950_cov_293.099_g7393_i0
NODE_10519_length_1949_cov_2.1246_g7404_i00.904365849.0630952.70805NODE_10519_length_1949_cov_2.1246_g7404_i0
NODE_106809_length_128_cov_1412.44_g99589_i00.920736081.5474589.02335NODE_106809_length_128_cov_1412.44_g99589_i0
NODE_10748_length_1916_cov_1427.62_g3092_i40.790717720.0418828.97377NODE_10748_length_1916_cov_1427.62_g3092_i4
NODE_10872_length_1900_cov_8.46721_g7639_i00.803511920.6915026.06230NODE_10872_length_1900_cov_8.46721_g7639_i0
In [132]:
nrow(top_datKME_ADJ1_.8)
233
In [133]:
#reorganize the column orders

top_datKME_ADJ1_.8_column_reordered <- top_datKME_ADJ1_.8[, c("transcript_id", "MM.red", "kWithin", "kTotal")] 

head(top_datKME_ADJ1_.8_column_reordered)
A data.frame: 6 × 4
transcript_idMM.redkWithinkTotal
<chr><dbl><dbl><dbl>
NODE_10321_length_1972_cov_1770.41_g3092_i1NODE_10321_length_1972_cov_1770.41_g3092_i1 0.801098022.1819631.65077
NODE_10505_length_1950_cov_293.099_g7393_i0NODE_10505_length_1950_cov_293.099_g7393_i0 0.912165147.4948855.68645
NODE_10519_length_1949_cov_2.1246_g7404_i0NODE_10519_length_1949_cov_2.1246_g7404_i0 0.904365849.0630952.70805
NODE_106809_length_128_cov_1412.44_g99589_i0NODE_106809_length_128_cov_1412.44_g99589_i00.920736081.5474589.02335
NODE_10748_length_1916_cov_1427.62_g3092_i4NODE_10748_length_1916_cov_1427.62_g3092_i4 0.790717720.0418828.97377
NODE_10872_length_1900_cov_8.46721_g7639_i0NODE_10872_length_1900_cov_8.46721_g7639_i0 0.803511920.6915026.06230
In [134]:
# reorder spreadsheet based on MM.red column from largest to smallest 

top_datKME_ADJ1_.8_column_reordered_desc <-top_datKME_ADJ1_.8_column_reordered %>% dplyr::arrange(desc(kWithin))
In [135]:
head(top_datKME_ADJ1_.8_column_reordered_desc)
A data.frame: 6 × 4
transcript_idMM.redkWithinkTotal
<chr><dbl><dbl><dbl>
NODE_37894_length_424_cov_305.016_g30731_i0NODE_37894_length_424_cov_305.016_g30731_i00.939554291.0726499.28030
NODE_35036_length_468_cov_206.31_g27935_i0NODE_35036_length_468_cov_206.31_g27935_i0 0.946163890.1002699.41706
NODE_46791_length_332_cov_515.13_g39571_i0NODE_46791_length_332_cov_515.13_g39571_i0 0.944366889.5124099.56021
NODE_48716_length_318_cov_339.129_g41496_i0NODE_48716_length_318_cov_339.129_g41496_i00.934903588.9331595.75253
NODE_38913_length_410_cov_355.231_g31740_i0NODE_38913_length_410_cov_355.231_g31740_i00.942873887.6412695.70440
NODE_34295_length_481_cov_68.2207_g27206_i0NODE_34295_length_481_cov_68.2207_g27206_i00.930952787.0328796.07521
In [59]:
#annotate the top datKME genes 
top_datKME_ADJ1_.8_column_reordered_desc_annot_trinotate <- left_join(top_datKME_ADJ1_.8_column_reordered_desc,Trinotate_lym_subset,by="transcript_id")
In [6]:
head(top_datKME_ADJ1_.8_column_reordered_desc_annot_trinotate)
A data.frame: 6 × 19
transcript_idMM.redkWithinkTotal#gene_idsprot_Top_BLASTX_hitRNAMMERprot_idprot_coordssprot_Top_BLASTP_hitPfamSignalPTmHMMeggnogKegggene_ontology_blastgene_ontology_pfamtranscriptpeptide
<chr><dbl><dbl><dbl><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr>
1NODE_37894_length_424_cov_305.016_g30731_i00.939554291.0726499.28030g30731..............
2NODE_35036_length_468_cov_206.31_g27935_i0 0.946163890.1002699.41706g27935..............
3NODE_46791_length_332_cov_515.13_g39571_i0 0.944366889.5124099.56021g39571..............
4NODE_48716_length_318_cov_339.129_g41496_i00.934903588.9331595.75253g41496..............
5NODE_38913_length_410_cov_355.231_g31740_i00.942873887.6412695.70440g31740..............
6NODE_34295_length_481_cov_68.2207_g27206_i00.930952787.0328796.07521g27206..............
In [26]:
#write.csv(top_datKME_ADJ1_.8_column_reordered_desc_annot_trinotate, file = "top_datKME_ADJ1_.8_desc_BCN.csv")

Identify genes in the bioluminescent upper lip with highest intramodular connectivity

Use the results from the DGE analysis in Section Step 9.4 to identify the significantly upregulated genes (expressed uniquely) with the highest module membership (MM > 0.8).

In [56]:
#significantly upregulated genes (expressed uniquely)from section 9.4.4
colnames(unique_genes_bio_upper_lip_info)
  1. 'transcript_id'
  2. 'baseMean'
  3. 'log2FoldChange'
  4. 'lfcSE'
  5. 'stat'
  6. 'pvalue'
  7. 'padj'
In [57]:
#change the gene column name to transcript ID
colnames(unique_genes_bio_upper_lip_info)[1] <- "transcript_id"
In [60]:
#determine how many of the co-expressed genes in the BCN with MM > 0.8 are significantly upregulated in the bioluminescent upper lip

top_datKME_ADJ1_.8_column_reordered_desc_annot_DE_bio_upper_lip <- left_join(top_datKME_ADJ1_.8_column_reordered_desc_annot_trinotate,unique_genes_bio_upper_lip_info,by="transcript_id")
In [7]:
head(top_datKME_ADJ1_.8_column_reordered_desc_annot_DE_bio_upper_lip)
A data.frame: 6 × 25
transcript_idMM.redkWithinkTotal#gene_idsprot_Top_BLASTX_hitRNAMMERprot_idprot_coordssprot_Top_BLASTP_hitgene_ontology_blastgene_ontology_pfamtranscriptpeptidebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><dbl><dbl><dbl><dbl><dbl><dbl>
1NODE_37894_length_424_cov_305.016_g30731_i00.939554291.0726499.28030g30731.........NANANANANANA
2NODE_35036_length_468_cov_206.31_g27935_i0 0.946163890.1002699.41706g27935.........NANANANANANA
3NODE_46791_length_332_cov_515.13_g39571_i0 0.944366889.5124099.56021g39571.........NANANANANANA
4NODE_48716_length_318_cov_339.129_g41496_i00.934903588.9331595.75253g41496.........NANANANANANA
5NODE_38913_length_410_cov_355.231_g31740_i00.942873887.6412695.70440g31740.........NANANANANANA
6NODE_34295_length_481_cov_68.2207_g27206_i00.930952787.0328796.07521g27206.........NANANANANANA
In [127]:
#write.csv(top_datKME_ADJ1_.8_column_reordered_desc_annot_DE_bio_upper_lip , file = "top_datKME_ADJ1_.8_annot_DE_unique_Bio_UpperLip.csv")

QC for Differential Gene Expression

Import V.tsujii gene expression matrix which consists of three tissue types - upper lip, compound eye and gut - with five biological replicates for each tissue. Generate the sample name sheet (meta sheet) for downstream DESeq2 analyses.

In [3]:
#read in the gene expression matrix 
organ_level_vt <- read.delim("combined_vargula_tsujii.tab", header = TRUE, sep = "\t", quote = "")
In [4]:
head(organ_level_vt)
A data.frame: 6 × 17
XVt.1A.counts.tabVt.1B.counts.tabVt.1C.counts.tabVt.2A.counts.tabVt.2B.counts.tabVt.2C.counts.tabVt.3A.counts.tabVt.3B.counts.tabVt.3C.counts.tabVt.4A.counts.tabVt.4B.counts.tabVt.4C.counts.tabVt.5A.counts.tabVt.5B.counts.tabVt.5C.counts.tabX.1
<chr><int><int><int><int><int><int><int><int><int><int><int><int><int><int><int><lgl>
1NODE_100001_length_141_cov_1.03488_g92781_i0 00 0 0 0 0 0 0 0 0 0 0 00 1NA
2NODE_10000_length_2017_cov_3.60449_g7052_i0 191 8 6 4 814015 9029 13532631NA
3NODE_100017_length_140_cov_9.63529_g92797_i0 60 2 1 0 5 5 0 9 0 1 7 20 9NA
4NODE_10001_length_2016_cov_11.7874_g7053_i0 232294820180662322040305149285NA
5NODE_100020_length_140_cov_8.6_g92800_i0 00 0 2 0 0 0 0 0 0 0 0 00 0NA
6NODE_100021_length_140_cov_8.04706_g92801_i0 00 0 0 0 1 0 0 3 0 0 2 10 0NA
In [5]:
#remove the extra column that got inserted 
row.names(organ_level_vt) <-organ_level_vt$X
organ_level_vt[1]<-NULL
In [6]:
organ_level_vt$X.1 <- NULL
In [7]:
#import the sample sheet
sampleinfo <- read_excel("sampleinfo_vtsujii_DGE.xlsx")
In [8]:
sampleinfo
A tibble: 15 × 2
samplesgroup
<chr><chr>
Vt.1A.counts.tabUpper_lip
Vt.1B.counts.tabEyes
Vt.1C.counts.tabGut
Vt.2A.counts.tabUpper_lip
Vt.2B.counts.tabEyes
Vt.2C.counts.tabGut
Vt.3A.counts.tabUpper_lip
Vt.3B.counts.tabEyes
Vt.3C.counts.tabGut
Vt.4A.counts.tabUpper_lip
Vt.4B.counts.tabEyes
Vt.4C.counts.tabGut
Vt.5A.counts.tabUpper_lip
Vt.5B.counts.tabEyes
Vt.5C.counts.tabGut
In [9]:
#generate the DESeq data set 
dds_count_table_organ_level <- DESeqDataSetFromMatrix(countData = organ_level_vt, colData = sampleinfo, design = ~group)
Warning message in DESeqDataSet(se, design = design, ignoreRank):
“some variables in design formula are characters, converting to factors”
In [10]:
#run DESeq2 function and normalization  
dds_vtsujii <- DESeq(dds_count_table_organ_level, betaPrior = FALSE, parallel = TRUE)
estimating size factors

estimating dispersions

gene-wise dispersion estimates: 1 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 1 workers

In [11]:
#perform a variance-stabilizing transformation
dds_vtsujii_vsd <- varianceStabilizingTransformation(dds_vtsujii)
In [ ]:
#transpose the matrix 
sampleDists_dds_vtsujii_vsd <- dist(t(assay(dds_vtsujii_vsd)))
In [13]:
#plot the heatmap
sampleDistMatrix_dds_vtsujii_vsd <- as.matrix(sampleDists_dds_vtsujii_vsd)
rownames(sampleDistMatrix_dds_vtsujii_vsd) <- paste(colData(dds_count_table_organ_level)$group) 
colnames(sampleDistMatrix_dds_vtsujii_vsd) <- colData(dds_count_table_organ_level)$samples
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix_dds_vtsujii_vsd,
          clustering_distance_rows=sampleDists_dds_vtsujii_vsd,
         clustering_distance_cols=sampleDists_dds_vtsujii_vsd,
         col=colors)
In [15]:
options(repr.plot.width=8, repr.plot.height=6, repr.plot.res = 150)
In [16]:
pcaData <- plotPCA(dds_vtsujii_vsd, intgroup="group", returnData=TRUE)
percentVar <- round(100 * attr(pcaData, "percentVar"))
ggplot(pcaData, aes(PC1, PC2, color=group, shape=group)) +
labs(color = "Tissue Types")+ labs(shape = "Tissue Types")+
  geom_point(size=5) +
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) + 
   geom_point(size=5) +
theme(panel.grid.major = element_line(colour = "gray97",  size = 1), panel.grid.minor = element_line(linetype = "dotted"), panel.background = element_rect(fill = NA), 
   legend.key = element_rect(fill = "gray100")) + theme(axis.line = element_line(size = 0.5,linetype = "solid")) + 
  theme(panel.border = element_rect(colour = "black", fill=NA, size=1)) +
coord_fixed() +
scale_color_manual(values = c('#F2C93D','#C97D97','#AC97C9'))+theme(
    legend.title = element_text(size = 16),
    legend.text = element_text(size = 14),  
    axis.title.x = element_text(size = 16),  
    axis.title.y = element_text(size = 16),  
    axis.text = element_text(size = 12)  
  )

Differential gene expression

Differential gene expression analysis was carried out in DESeq2 (Love et al., 2014). For Vargula tsujii , we determined differentially upregulated genes in three tissue types - bioluminescent upper lip, compound eye and gut - using five biological replicates for each tissue. DESeq2 was employed using a p-value < 0.05 and FC > 1.5 for the significance of differentially expressed genes using the Benjamini-Hochberg method to account for false discovery rate (FDR). Pairwise comparisons were done across tissue types (i.e., bioluminescent upper lip to compound eye, bioluminescent upper lip to gut, gut to compound eye). To identify tissue-specific differential expression (i.e. significantly upregulated genes that are uniquely expressed), each tissue was compared to the other two. For example, the expression in the bioluminescent upper lip was determined by comparing it to both the compound eye and the gut tissues.For each pairwise comparison, the reference tissue was specified to determine the significantly upregulated genes in each tissue type (i.e., positive vs negative logfold change).

In [17]:
#run DESeq2 function and normalization  
dds_vtsujii <- DESeq(dds_count_table_organ_level)
estimating size factors

estimating dispersions

gene-wise dispersion estimates

mean-dispersion relationship

final dispersion estimates

fitting model and testing

DGE - Gut vs Compound Eye

In [18]:
#set Eyes as reference tissue
dds_vtsujii$group <- relevel(dds_vtsujii$group, ref= "Eyes")
In [19]:
#rerun DESeq command after reference is specified
dds_vtsujii <- DESeq(dds_vtsujii)
using pre-existing size factors

estimating dispersions

found already estimated dispersions, replacing these

gene-wise dispersion estimates

mean-dispersion relationship

final dispersion estimates

fitting model and testing

In [21]:
#define contrasts, extract results table, and shrink the log2 fold changes

res_tableOE_unshrunken_Gut_Vs_Eye <- results(dds_vtsujii, contrast= c("group", "Gut", "Eyes") , alpha = 0.05)


res_tableOE_Gut_Vs_Eye  <- lfcShrink(dds_vtsujii, contrast= c("group", "Gut", "Eyes"), res = res_tableOE_unshrunken_Gut_Vs_Eye )
using 'normal' for LFC shrinkage, the Normal prior from Love et al (2014).

Note that type='apeglm' and type='ashr' have shown to have less bias than type='normal'.
See ?lfcShrink for more details on shrinkage type, and the DESeq2 vignette.
Reference: https://doi.org/10.1093/bioinformatics/bty895

In [22]:
mcols(res_tableOE_Gut_Vs_Eye, use.names=T)
DataFrame with 6 rows and 2 columns
                       type                               description
                <character>                               <character>
baseMean       intermediate mean of normalized counts for all samples
log2FoldChange      results log2 fold change (MAP): group Gut vs Eyes
lfcSE               results         standard error: group Gut vs Eyes
stat                results         Wald statistic: group Gut vs Eyes
pvalue              results      Wald test p-value: group Gut vs Eyes
padj                results                      BH adjusted p-values
In [23]:
# set thresholds
# lfc.cutoff value of 0.58 translates to a 1.5 log2 fold change 
# padj.cutoff value of 0.05 

padj.cutoff <- 0.05
lfc.cutoff <- 0.58
In [24]:
res_tableOE_Gut_Vs_Eye_tb <- res_tableOE_Gut_Vs_Eye %>%
  data.frame() %>%
  rownames_to_column(var="gene") %>% 
  as_tibble()
In [25]:
#determine all differentially expressed genes 
sigOE_Gut_Vs_Eye <- res_tableOE_Gut_Vs_Eye_tb %>%
        filter(padj < padj.cutoff & abs(log2FoldChange) > lfc.cutoff)
In [26]:
#extract all genes that are significantly upregulated in the Gut (positive log2 fold change)
sigOE_UPREGULATED_logfold_Gut_vs_Eye <- sigOE_Gut_Vs_Eye %>%
        filter(padj < padj.cutoff & log2FoldChange > lfc.cutoff)
In [27]:
head(sigOE_UPREGULATED_logfold_Gut_vs_Eye)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10010_length_2015_cov_7.32245_g7057_i0 7.4816914.3290051.19867233.6464322.659066e-043.236611e-03
NODE_10015_length_2014_cov_682.243_g7060_i0 570.3455457.1526040.95577687.4040691.320736e-132.149895e-11
NODE_10040_length_2011_cov_2.41973_g7075_i0 14.8864865.9607081.25773134.6833272.822558e-066.652356e-05
NODE_10047_length_2010_cov_1.26305_g7081_i0 2.1085493.0922971.11612362.7754705.512192e-033.450172e-02
NODE_100534_length_140_cov_1.47059_g93314_i0 21.1810266.4151441.33791754.6755692.931403e-066.869105e-05
NODE_1006_length_4905_cov_1.0567_g707_i0 2.2374503.4829541.22488122.8148334.880265e-033.151585e-02
In [28]:
colnames(sigOE_UPREGULATED_logfold_Gut_vs_Eye)[1]<- "transcript_id"
In [29]:
#add annotation
sigOE_UPREGULATED_logfold_Gut_vs_Eye_annot <- left_join(sigOE_UPREGULATED_logfold_Gut_vs_Eye,Trinotate_lym_subset,by="transcript_id")
In [8]:
#genes that are significantly upregulated in the Gut (positive log2fold change)
head(sigOE_UPREGULATED_logfold_Gut_vs_Eye_annot)
A tibble: 6 × 22
transcript_idbaseMeanlog2FoldChangelfcSEstatpvaluepadj#gene_idsprot_Top_BLASTX_hitRNAMMERsprot_Top_BLASTP_hitPfamSignalPTmHMMeggnogKegggene_ontology_blastgene_ontology_pfamtranscriptpeptide
<chr><dbl><dbl><dbl><dbl><dbl><dbl><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr>
NODE_10010_length_2015_cov_7.32245_g7057_i0 7.4816914.3290051.19867233.6464322.659066e-043.236611e-03g7057 . .. . . . . . . . ..
NODE_10015_length_2014_cov_682.243_g7060_i0 570.3455457.1526040.95577687.4040691.320736e-132.149895e-11g7060 YM9I_CAEEL^YM9I_CAEEL^Q:165-1547,H:47-510^35.789%ID^E:1.6e-90^RecName: Full=Putative serine protease F56F10.1;^Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida; Rhabditina; Rhabditomorpha; Rhabditoidea; Rhabditidae; Peloderinae; Caenorhabditis .YM9I_CAEEL^YM9I_CAEEL^Q:35-495,H:47-510^35.789%ID^E:7.25e-93^RecName: Full=Putative serine protease F56F10.1;^Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida; Rhabditina; Rhabditomorpha; Rhabditoidea; Rhabditidae; Peloderinae; Caenorhabditis PF05577.12^Peptidase_S28^Serine carboxypeptidase S28^52-486^E:8e-146sigP:1^21^0.762^YES. ENOG410XSGG^protease, serine, 16 (thymus)KEGG:cel:CELE_F56F10.1 GO:0045121^cellular_component^membrane raft`GO:0008239^molecular_function^dipeptidyl-peptidase activity`GO:0008236^molecular_function^serine-type peptidase activity`GO:0045087^biological_process^innate immune response`GO:0006508^biological_process^proteolysis GO:0008236^molecular_function^serine-type peptidase activity`GO:0006508^biological_process^proteolysis..
NODE_10040_length_2011_cov_2.41973_g7075_i0 14.8864865.9607081.25773134.6833272.822558e-066.652356e-05g7075 TSN9_DANRE^TSN9_DANRE^Q:157-810,H:8-230^23.556%ID^E:7.86e-12^RecName: Full=Tetraspanin-9;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Actinopterygii; Neopterygii; Teleostei; Ostariophysi; Cypriniformes; Cyprinidae; Danio .TSN9_DANRE^TSN9_DANRE^Q:12-229,H:8-230^26.222%ID^E:3.91e-25^RecName: Full=Tetraspanin-9;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Actinopterygii; Neopterygii; Teleostei; Ostariophysi; Cypriniformes; Cyprinidae; Danio PF00335.20^Tetraspanin^Tetraspanin family^13-226^E:3.8e-43 . ExpAA=90.13^PredHel=4^Topology=i17-39o59-81i88-110o203-225iENOG4111IRY^tetraspanin KEGG:dre:431733`KO:K17350GO:0005887^cellular_component^integral component of plasma membrane GO:0016021^cellular_component^integral component of membrane ..
NODE_10047_length_2010_cov_1.26305_g7081_i0 2.1085493.0922971.11612362.7754705.512192e-033.450172e-02g7081 DOCK7_HUMAN^DOCK7_HUMAN^Q:1022-1264,H:1900-1980^69.136%ID^E:6.59e-98^RecName: Full=Dedicator of cytokinesis protein 7;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo`DOCK7_HUMAN^DOCK7_HUMAN^Q:640-1020,H:1772-1899^53.03%ID^E:6.59e-98^RecName: Full=Dedicator of cytokinesis protein 7;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo`DOCK7_HUMAN^DOCK7_HUMAN^Q:1245-1484,H:1974-2053^60%ID^E:6.59e-98^RecName: Full=Dedicator of cytokinesis protein 7;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo`DOCK7_HUMAN^DOCK7_HUMAN^Q:1495-1617,H:2056-2096^70.732%ID^E:6.59e-98^RecName: Full=Dedicator of cytokinesis protein 7;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo`DOCK7_HUMAN^DOCK7_HUMAN^Q:3-818,H:1561-1830^60.071%ID^E:8.86e-91^RecName: Full=Dedicator of cytokinesis protein 7;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo`DOCK7_HUMAN^DOCK7_HUMAN^Q:449-730,H:1712-1801^46.809%ID^E:1.82e-11^RecName: Full=Dedicator of cytokinesis protein 7;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo.DOCK7_HUMAN^DOCK7_HUMAN^Q:1-160,H:1561-1720^83.125%ID^E:1.05e-85^RecName: Full=Dedicator of cytokinesis protein 7;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo PF06920.13^DHR-2^Dock homology region 2^21-164^E:1.1e-46 . . ENOG410XNVY^Dedicator of cytokinesis KEGG:hsa:85440`KO:K21852 GO:0030424^cellular_component^axon`GO:0045178^cellular_component^basal part of cell`GO:0005925^cellular_component^focal adhesion`GO:0030426^cellular_component^growth cone`GO:0043005^cellular_component^neuron projection`GO:0005085^molecular_function^guanyl-nucleotide exchange factor activity`GO:0048365^molecular_function^Rac GTPase binding`GO:0090630^biological_process^activation of GTPase activity`GO:0007409^biological_process^axonogenesis`GO:0045200^biological_process^establishment of neuroblast polarity`GO:0022027^biological_process^interkinetic nuclear migration`GO:0000226^biological_process^microtubule cytoskeleton organization`GO:0120163^biological_process^negative regulation of cold-induced thermogenesis`GO:0031175^biological_process^neuron projection development`GO:0033138^biological_process^positive regulation of peptidyl-serine phosphorylation`GO:1904754^biological_process^positive regulation of vascular associated smooth muscle cell migration`GO:0050767^biological_process^regulation of neurogenesis`GO:0007264^biological_process^small GTPase mediated signal transduction. ..
NODE_100534_length_140_cov_1.47059_g93314_i0 21.1810266.4151441.33791754.6755692.931403e-066.869105e-05g93314. .. . . . . . . . ..
NODE_1006_length_4905_cov_1.0567_g707_i0 2.2374503.4829541.22488122.8148334.880265e-033.151585e-02g707 THADA_CANLF^THADA_CANLF^Q:2328-1813,H:1149-1318^33.146%ID^E:4.07e-13^RecName: Full=Thyroid adenoma-associated protein homolog;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Carnivora; Caniformia; Canidae; Canis .THADA_CHLAE^THADA_CHLAE^Q:120-517,H:302-769^19.579%ID^E:2.77e-07^RecName: Full=Thyroid adenoma-associated protein homolog;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Cercopithecidae; Cercopithecinae; Chlorocebus. . . . . . . ..
In [31]:
#extract all genes that are significantly upregulated in the Compound Eye (negative log2fold change) but that are downregulated in the Gut. 
sigOE_DOWNREGULATED_logfold_Gut_vs_Eye <- sigOE_Gut_Vs_Eye %>%
        filter(padj < padj.cutoff & log2FoldChange < lfc.cutoff)
In [3]:
#save these two dataframes for downstream analysis in Section 9.4

#genes that are significantly upregulated in the Gut (positive log2fold change)
head(sigOE_UPREGULATED_logfold_Gut_vs_Eye)

#genes that are significantly upregulated in the Compound Eye (negative log2fold change) but that are downregulated in the Gut. 
head(sigOE_DOWNREGULATED_logfold_Gut_vs_Eye)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10010_length_2015_cov_7.32245_g7057_i0 7.4816914.3290051.19867233.6464322.659066e-043.236611e-03
NODE_10015_length_2014_cov_682.243_g7060_i0 570.3455457.1526040.95577687.4040691.320736e-132.149895e-11
NODE_10040_length_2011_cov_2.41973_g7075_i0 14.8864865.9607081.25773134.6833272.822558e-066.652356e-05
NODE_10047_length_2010_cov_1.26305_g7081_i0 2.1085493.0922971.11612362.7754705.512192e-033.450172e-02
NODE_100534_length_140_cov_1.47059_g93314_i0 21.1810266.4151441.33791754.6755692.931403e-066.869105e-05
NODE_1006_length_4905_cov_1.0567_g707_i0 2.2374503.4829541.22488122.8148334.880265e-033.151585e-02
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10030_length_2012_cov_88.44_g145_i5 77.516326-2.0043520.7483891-2.6772667.422565e-034.264394e-02
NODE_1009_length_4897_cov_2.69228_g709_i0 18.842542-1.1065160.4161180-2.6597357.820226e-034.417345e-02
NODE_10106_length_2001_cov_20.3505_g7125_i0 32.497969-3.7863031.2661859-3.0125302.590799e-031.957288e-02
NODE_10108_length_2001_cov_9.98304_g7126_i0 28.846375-4.7706260.7722330-6.1236489.145690e-105.682196e-08
NODE_101175_length_138_cov_7.48193_g93955_i0 3.923975-3.8243711.2262699-3.0757522.099721e-031.670540e-02
NODE_10126_length_1999_cov_2.60802_g7138_i0 15.963167-5.2191880.9667613-5.3075571.111044e-073.921827e-06

DGE - Bioluminescent Upper Lip vs Compound Eye

In [33]:
#define contrasts, extract results table, and shrink the log2 fold changes

res_tableOE_unshrunken_Bio_UpperLip_Vs_Eye <- results(dds_vtsujii, contrast= c("group", "Upper_lip", "Eyes") , alpha = 0.05)


res_tableOE_Bio_UpperLip_Vs_Eye <- lfcShrink(dds_vtsujii, contrast= c("group", "Upper_lip", "Eyes"), res = res_tableOE_unshrunken_Bio_UpperLip_Vs_Eye)
using 'normal' for LFC shrinkage, the Normal prior from Love et al (2014).

Note that type='apeglm' and type='ashr' have shown to have less bias than type='normal'.
See ?lfcShrink for more details on shrinkage type, and the DESeq2 vignette.
Reference: https://doi.org/10.1093/bioinformatics/bty895

In [34]:
mcols(res_tableOE_Bio_UpperLip_Vs_Eye, use.names=T)
DataFrame with 6 rows and 2 columns
                       type                                     description
                <character>                                     <character>
baseMean       intermediate       mean of normalized counts for all samples
log2FoldChange      results log2 fold change (MAP): group Upper_lip vs Eyes
lfcSE               results         standard error: group Upper_lip vs Eyes
stat                results         Wald statistic: group Upper lip vs Eyes
pvalue              results      Wald test p-value: group Upper lip vs Eyes
padj                results                            BH adjusted p-values
In [35]:
res_tableOE_Bio_UpperLip_Vs_Eye_tb <- res_tableOE_Bio_UpperLip_Vs_Eye %>%
  data.frame() %>%
  rownames_to_column(var="gene") %>% 
  as_tibble()
In [36]:
#determine all differentially expressed genes 
sigOE_Bio_UpperLip_Vs_Eye <- res_tableOE_Bio_UpperLip_Vs_Eye_tb %>%
        filter(padj < padj.cutoff & abs(log2FoldChange) > lfc.cutoff)
In [37]:
#extract all genes that are significantly upregulated in the bioluminescent upper lip (positive log2 fold change)
sigOE_UPREGULATED_logfold_Bio_UpperLip_Vs_Eye <- sigOE_Bio_UpperLip_Vs_Eye %>%
        filter(padj < padj.cutoff & log2FoldChange > lfc.cutoff)
In [38]:
head(sigOE_UPREGULATED_logfold_Bio_UpperLip_Vs_Eye)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10010_length_2015_cov_7.32245_g7057_i0 7.4816914.2448361.20231983.5702753.566072e-040.0130704556
NODE_10049_length_2009_cov_1010.67_g4245_i1144.9421193.6968330.81408524.5291835.921214e-060.0004848792
NODE_10075_length_2006_cov_50.2255_g7102_i0 47.5412142.1079240.68383913.0815622.059176e-030.0439239389
NODE_10110_length_2001_cov_7.02312_g7128_i0 10.2969264.4313081.02219654.3947011.109251e-050.0008282910
NODE_10314_length_1973_cov_79.1867_g7270_i0 51.5047331.7009030.51996233.2693441.077972e-030.0286489771
NODE_1031_length_4873_cov_1.60834_g724_i0 14.3213164.1184641.04671623.9491327.843501e-050.0040845539
In [39]:
colnames(sigOE_UPREGULATED_logfold_Bio_UpperLip_Vs_Eye)[1] <- "transcript_id"
In [40]:
#add annotation
sigOE_UPREGULATED_logfold_Bio_UpperLip_Vs_Eye_annot <- left_join(sigOE_UPREGULATED_logfold_Bio_UpperLip_Vs_Eye,Trinotate_lym_subset,by="transcript_id")
In [10]:
#genes that are significantly upregulated in Bioluminescent Upper Lip (positive log2fold change)
head(sigOE_UPREGULATED_logfold_Bio_UpperLip_Vs_Eye_annot)
A tibble: 6 × 22
transcript_idbaseMeanlog2FoldChangelfcSEstatpvaluepadj#gene_idsprot_Top_BLASTX_hitRNAMMERsprot_Top_BLASTP_hitPfamSignalPTmHMMeggnogKegggene_ontology_blastgene_ontology_pfamtranscriptpeptide
<chr><dbl><dbl><dbl><dbl><dbl><dbl><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr>
NODE_10010_length_2015_cov_7.32245_g7057_i0 7.4816914.2448361.20231983.5702753.566072e-040.0130704556g7057. .. . ... . . . ..
NODE_10049_length_2009_cov_1010.67_g4245_i1144.9421193.6968330.81408524.5291835.921214e-060.0004848792g4245ATS16_MOUSE^ATS16_MOUSE^Q:1751-417,H:93-572^25.662%ID^E:3.7e-36^RecName: Full=A disintegrin and metalloproteinase with thrombospondin motifs 16;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus .ADT1_CAEEL^ADT1_CAEEL^Q:53-403,H:141-533^27.114%ID^E:4.13e-35^RecName: Full=A disintegrin and metalloproteinase with thrombospondin motifs adt-1 {ECO:0000305};^Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida; Rhabditina; Rhabditomorpha; Rhabditoidea; Rhabditidae; Peloderinae; Caenorhabditis PF13688.6^Reprolysin_5^Metallo-peptidase family M12^131-300^E:5.7e-12`PF01421.19^Reprolysin^Reprolysin (M12B) family zinc metalloprotease^135-323^E:1.9e-15`PF13582.6^Reprolysin_3^Metallo-peptidase family M12B Reprolysin-like^142-274^E:4.1e-09`PF13583.6^Reprolysin_4^Metallo-peptidase family M12B Reprolysin-like^200-286^E:3.1e-06`PF13574.6^Reprolysin_2^Metallo-peptidase family M12B Reprolysin-like^219-311^E:3.9e-08`PF17771.1^ADAM_CR_2^ADAM cysteine-rich domain^339-403^E:1.2e-09`PF17771.1^ADAM_CR_2^ADAM cysteine-rich domain^430-498^E:5.6e-05..ENOG41104P0^Thrombospondin type 1 domain KEGG:cel:CELE_C02B4.1 GO:0005576^cellular_component^extracellular region`GO:0046872^molecular_function^metal ion binding`GO:0004222^molecular_function^metalloendopeptidase activity GO:0004222^molecular_function^metalloendopeptidase activity`GO:0006508^biological_process^proteolysis ..
NODE_10075_length_2006_cov_50.2255_g7102_i0 47.5412142.1079240.68383913.0815622.059176e-030.0439239389g71026PGD_HUMAN^6PGD_HUMAN^Q:1639-203,H:3-483^76.923%ID^E:0^RecName: Full=6-phosphogluconate dehydrogenase, decarboxylating;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo .6PGD_HUMAN^6PGD_HUMAN^Q:1-468,H:14-483^76.596%ID^E:0^RecName: Full=6-phosphogluconate dehydrogenase, decarboxylating;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo PF03446.15^NAD_binding_2^NAD binding domain of 6-phosphogluconate dehydrogenase^1-158^E:3.6e-45`PF00393.19^6PGD^6-phosphogluconate dehydrogenase, C-terminal domain^166-454^E:3.7e-132 ..COG0362^Catalyzes the oxidative decarboxylation of 6- phosphogluconate to ribulose 5-phosphate and CO(2), with concomitant reduction of NADP to NADPH (By similarity)KEGG:hsa:5226`KO:K00033 GO:0005829^cellular_component^cytosol`GO:0070062^cellular_component^extracellular exosome`GO:0005634^cellular_component^nucleus`GO:0050661^molecular_function^NADP binding`GO:0004616^molecular_function^phosphogluconate dehydrogenase (decarboxylating) activity`GO:0046177^biological_process^D-gluconate catabolic process`GO:0055114^biological_process^oxidation-reduction process`GO:0019322^biological_process^pentose biosynthetic process`GO:0006098^biological_process^pentose-phosphate shunt`GO:0009051^biological_process^pentose-phosphate shunt, oxidative branchGO:0050661^molecular_function^NADP binding`GO:0004616^molecular_function^phosphogluconate dehydrogenase (decarboxylating) activity`GO:0006098^biological_process^pentose-phosphate shunt`GO:0055114^biological_process^oxidation-reduction process..
NODE_10110_length_2001_cov_7.02312_g7128_i0 10.2969264.4313081.02219654.3947011.109251e-050.0008282910g7128GORAB_BOVIN^GORAB_BOVIN^Q:1899-961,H:1-293^32.71%ID^E:5.24e-28^RecName: Full=RAB6-interacting golgin;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Bovidae; Bovinae; Bos .GORAB_BOVIN^GORAB_BOVIN^Q:1-313,H:1-293^33.028%ID^E:2.23e-39^RecName: Full=RAB6-interacting golgin;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Bovidae; Bovinae; Bos PF04949.13^Transcrip_act^Transcriptional activator^152-296^E:2.8e-06 ..ENOG4111VTM^Golgin, RAB6-interacting . GO:0005794^cellular_component^Golgi apparatus`GO:1905515^biological_process^non-motile cilium assembly . ..
NODE_10314_length_1973_cov_79.1867_g7270_i0 51.5047331.7009030.51996233.2693441.077972e-030.0286489771g7270NSA2_HUMAN^NSA2_HUMAN^Q:1874-1107,H:1-256^77.344%ID^E:1.38e-136^RecName: Full=Ribosome biogenesis protein NSA2 homolog;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo .NSA2_HUMAN^NSA2_HUMAN^Q:1-260,H:1-260^77.692%ID^E:2.05e-153^RecName: Full=Ribosome biogenesis protein NSA2 homolog;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo PF01201.22^Ribosomal_S8e^Ribosomal protein S8e^36-259^E:5e-48 ..COG2007^40S ribosomal protein S8 KEGG:hsa:10412`KO:K14842GO:0005730^cellular_component^nucleolus`GO:0030687^cellular_component^preribosome, large subunit precursor`GO:0003723^molecular_function^RNA binding`GO:0006364^biological_process^rRNA processing . ..
NODE_1031_length_4873_cov_1.60834_g724_i0 14.3213164.1184641.04671623.9491327.843501e-050.0040845539g724 ZN710_HUMAN^ZN710_HUMAN^Q:1685-2203,H:262-427^32.768%ID^E:8.13e-16^RecName: Full=Zinc finger protein 710;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo`ZN710_HUMAN^ZN710_HUMAN^Q:1895-2200,H:493-594^31.068%ID^E:2.92e-08^RecName: Full=Zinc finger protein 710;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo`ZN710_HUMAN^ZN710_HUMAN^Q:1895-2197,H:381-481^33.333%ID^E:9.07e-08^RecName: Full=Zinc finger protein 710;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo`ZN710_HUMAN^ZN710_HUMAN^Q:1895-2200,H:465-566^29.126%ID^E:1.62e-07^RecName: Full=Zinc finger protein 710;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo`ZN710_HUMAN^ZN710_HUMAN^Q:1829-2200,H:389-510^28.906%ID^E:2.63e-07^RecName: Full=Zinc finger protein 710;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo.ZG62_XENLA^ZG62_XENLA^Q:548-653,H:8-113^38.318%ID^E:2.47e-17^RecName: Full=Gastrula zinc finger protein XlCGF62.1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Amphibia; Batrachia; Anura; Pipoidea; Pipidae; Xenopodinae; Xenopus; Xenopus`ZG62_XENLA^ZG62_XENLA^Q:548-642,H:64-159^35.417%ID^E:1.84e-12^RecName: Full=Gastrula zinc finger protein XlCGF62.1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Amphibia; Batrachia; Anura; Pipoidea; Pipidae; Xenopodinae; Xenopus; Xenopus`ZG62_XENLA^ZG62_XENLA^Q:571-648,H:4-80^38.462%ID^E:2.62e-10^RecName: Full=Gastrula zinc finger protein XlCGF62.1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Amphibia; Batrachia; Anura; Pipoidea; Pipidae; Xenopodinae; Xenopus; XenopusPF00096.26^zf-C2H2^Zinc finger, C2H2 type^573-596^E:0.00011`PF00096.26^zf-C2H2^Zinc finger, C2H2 type^602-624^E:0.00036 ... . GO:0005634^cellular_component^nucleus`GO:0003677^molecular_function^DNA binding`GO:0046872^molecular_function^metal ion binding GO:0003676^molecular_function^nucleic acid binding ..
In [43]:
#extract all genes that are significantly upregulated in the Compound Eye (negative log2fold change) but that are downregulated in the bioluminescent upper lip 
sigOE_DOWNREGULATED_logfold_Bio_UpperLip_Vs_Eye <- sigOE_Bio_UpperLip_Vs_Eye %>%
        filter(padj < padj.cutoff & log2FoldChange < lfc.cutoff)
In [44]:
colnames(sigOE_DOWNREGULATED_logfold_Bio_UpperLip_Vs_Eye)[1] <- "transcript_id"
In [45]:
#add annotation 
sigOE_DOWNREGULATED_logfold_Bio_UpperLip_Vs_Eye_annot <- left_join(sigOE_DOWNREGULATED_logfold_Bio_UpperLip_Vs_Eye,Trinotate_lym_subset,by="transcript_id")
In [11]:
head(sigOE_DOWNREGULATED_logfold_Bio_UpperLip_Vs_Eye_annot)
A tibble: 6 × 22
transcript_idbaseMeanlog2FoldChangelfcSEstatpvaluepadj#gene_idsprot_Top_BLASTX_hitRNAMMERsprot_Top_BLASTP_hitPfamSignalPTmHMMeggnogKegggene_ontology_blastgene_ontology_pfamtranscriptpeptide
<chr><dbl><dbl><dbl><dbl><dbl><dbl><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr>
NODE_10126_length_1999_cov_2.60802_g7138_i0 15.963167-3.4422930.9226241 -3.7048882.114840e-048.941592e-03g7138 GST1D_ANOGA^GST1D_ANOGA^Q:1957-1379,H:1-193^50.515%ID^E:1.3e-57^RecName: Full=Glutathione S-transferase 1, isoform D;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Holometabola; Diptera; Nematocera; Culicoidea; Culicidae; Anophelinae; Anopheles.GST1D_ANOGA^GST1D_ANOGA^Q:3-191,H:1-189^51.579%ID^E:1.46e-69^RecName: Full=Glutathione S-transferase 1, isoform D;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Holometabola; Diptera; Nematocera; Culicoidea; Culicidae; Anophelinae; AnophelesPF13417.6^GST_N_3^Glutathione S-transferase, N-terminal domain^5-79^E:6.6e-13`PF02798.20^GST_N^Glutathione S-transferase, N-terminal domain^15-74^E:4.2e-11`PF13409.6^GST_N_2^Glutathione S-transferase, N-terminal domain^37-76^E:3.2e-06`PF00043.25^GST_C^Glutathione S-transferase, C-terminal domain^120-188^E:1.6e-05`PF14497.6^GST_C_3^Glutathione S-transferase, C-terminal domain^123-191^E:6e-06. . . . GO:0005737^cellular_component^cytoplasm`GO:0018833^molecular_function^DDT-dehydrochlorinase activity`GO:0004364^molecular_function^glutathione transferase activity`GO:0006749^biological_process^glutathione metabolic process GO:0005515^molecular_function^protein binding ..
NODE_101633_length_138_cov_2.06024_g94413_i0 7.667808-6.1495541.0781402 -5.5163643.460855e-084.918433e-06g94413. .. . . . . . . . ..
NODE_10218_length_1985_cov_18.5554_g7198_i0 61.993605-6.0859230.6771456 -8.8415809.437437e-193.922413e-16g7198 WSD_ACIAD^WSD_ACIAD^Q:1368-265,H:62-446^21.76%ID^E:1.15e-12^RecName: Full=O-acyltransferase WSD;^Bacteria; Proteobacteria; Gammaproteobacteria; Pseudomonadales; Moraxellaceae; Acinetobacter .WSD_ACIAD^WSD_ACIAD^Q:196-563,H:62-446^22.005%ID^E:9.38e-13^RecName: Full=O-acyltransferase WSD;^Bacteria; Proteobacteria; Gammaproteobacteria; Pseudomonadales; Moraxellaceae; Acinetobacter PF06974.13^DUF1298^Protein of unknown function (DUF1298)^418-558^E:2.4e-27 . ExpAA=83.94^PredHel=4^Topology=i36-58o73-95i452-474o504-526iENOG410XS7A^Acyltransferase, ws dgat mgatKEGG:aci:ACIAD0832`KO:K00635GO:0102966^molecular_function^arachidoyl-CoA:1-dodecanol O-acyltransferase activity`GO:0004144^molecular_function^diacylglycerol O-acyltransferase activity`GO:0047196^molecular_function^long-chain-alcohol O-fatty-acyltransferase activity`GO:0006071^biological_process^glycerol metabolic process`GO:0019432^biological_process^triglyceride biosynthetic processGO:0004144^molecular_function^diacylglycerol O-acyltransferase activity..
NODE_1021_length_4884_cov_10.72_g718_i0 45.575618-6.5144800.8450182 -7.5173235.590900e-141.734596e-11g718 . .. . . . . . . . ..
NODE_10232_length_1983_cov_42.0814_g7210_i0 434.539937-9.2868110.7189709-12.6132621.784501e-366.551496e-33g7210 . .. . sigP:1^19^0.705^YES. . . . . ..
NODE_102471_length_136_cov_2.41975_g95251_i0 3.353358-4.7867481.3410759 -3.5717663.545819e-041.301788e-02g95251. .. . . . . . . . ..
In [4]:
#save these two dataframes for downstream analysis in Section 9.4

#genes that are significantly upregulated in Bioluminescent Upper Lip (positive log2fold change) 
head(sigOE_UPREGULATED_logfold_Bio_UpperLip_Vs_Eye)

#genes that are significantly upregulated in the Compound Eye (negative log2fold change) but that are downregulated in the bioluminescent upper lip 
head(sigOE_DOWNREGULATED_logfold_Bio_UpperLip_Vs_Eye)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10010_length_2015_cov_7.32245_g7057_i0 7.4816914.2448361.20231983.5702753.566072e-040.0130704556
NODE_10049_length_2009_cov_1010.67_g4245_i1144.9421193.6968330.81408524.5291835.921214e-060.0004848792
NODE_10075_length_2006_cov_50.2255_g7102_i0 47.5412142.1079240.68383913.0815622.059176e-030.0439239389
NODE_10110_length_2001_cov_7.02312_g7128_i0 10.2969264.4313081.02219654.3947011.109251e-050.0008282910
NODE_10314_length_1973_cov_79.1867_g7270_i0 51.5047331.7009030.51996233.2693441.077972e-030.0286489771
NODE_1031_length_4873_cov_1.60834_g724_i0 14.3213164.1184641.04671623.9491327.843501e-050.0040845539
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10126_length_1999_cov_2.60802_g7138_i0 15.963167-3.4422930.9226241 -3.7048882.114840e-048.941592e-03
NODE_101633_length_138_cov_2.06024_g94413_i0 7.667808-6.1495541.0781402 -5.5163643.460855e-084.918433e-06
NODE_10218_length_1985_cov_18.5554_g7198_i0 61.993605-6.0859230.6771456 -8.8415809.437437e-193.922413e-16
NODE_1021_length_4884_cov_10.72_g718_i0 45.575618-6.5144800.8450182 -7.5173235.590900e-141.734596e-11
NODE_10232_length_1983_cov_42.0814_g7210_i0 434.539937-9.2868110.7189709-12.6132621.784501e-366.551496e-33
NODE_102471_length_136_cov_2.41975_g95251_i0 3.353358-4.7867481.3410759 -3.5717663.545819e-041.301788e-02

DGE - Bioluminescent Upper Lip vs Gut

In [4]:
#now set gut as reference
dds_vtsujii$group <- relevel(dds_vtsujii$group, ref= "Gut")
In [5]:
#rerun DESeq after setting a new reference
dds_vtsujii <- DESeq(dds_vtsujii)
using pre-existing size factors

estimating dispersions

found already estimated dispersions, replacing these

gene-wise dispersion estimates

mean-dispersion relationship

final dispersion estimates

fitting model and testing

In [6]:
#Define contrasts, extract results table, and shrink the log2 fold changes

res_tableOE_unshrunken_Bio_Upper_Lip_Vs_Gut <- results(dds_vtsujii, contrast= c("group", "Upper_lip", "Gut") , alpha = 0.05)


res_tableOE_Bio_Upper_Lip_Vs_Gut  <- lfcShrink(dds_vtsujii, contrast= c("group", "Upper_lip", "Gut"), res = res_tableOE_unshrunken_Bio_Upper_Lip_Vs_Gut )
using 'normal' for LFC shrinkage, the Normal prior from Love et al (2014).

Note that type='apeglm' and type='ashr' have shown to have less bias than type='normal'.
See ?lfcShrink for more details on shrinkage type, and the DESeq2 vignette.
Reference: https://doi.org/10.1093/bioinformatics/bty895

In [7]:
mcols(res_tableOE_Bio_Upper_Lip_Vs_Gut, use.names=T)
DataFrame with 6 rows and 2 columns
                       type                                    description
                <character>                                    <character>
baseMean       intermediate      mean of normalized counts for all samples
log2FoldChange      results log2 fold change (MAP): group Upper_lip vs Gut
lfcSE               results         standard error: group Upper_lip vs Gut
stat                results         Wald statistic: group Upper lip vs Gut
pvalue              results      Wald test p-value: group Upper lip vs Gut
padj                results                           BH adjusted p-values
In [8]:
res_tableOE_Bio_Upper_Lip_Vs_Gut_tb <- res_tableOE_Bio_Upper_Lip_Vs_Gut %>%
  data.frame() %>%
  rownames_to_column(var="gene") %>% 
  as_tibble()
In [9]:
#all differentially expressed genes 
sigOE_Bio_Upper_Lip_Vs_Gut <- res_tableOE_Bio_Upper_Lip_Vs_Gut_tb %>%
        filter(padj < padj.cutoff & abs(log2FoldChange) > lfc.cutoff)
In [10]:
#extract all genes that are significantly upregulated in the bioluminescent upper lip (positive log2 fold change)
sigOE_UPREGULATED_logfold_Bio_Upper_Lip_Vs_Gut <- sigOE_Bio_Upper_Lip_Vs_Gut %>%
        filter(padj < padj.cutoff & log2FoldChange > lfc.cutoff)
In [11]:
head(sigOE_UPREGULATED_logfold_Bio_Upper_Lip_Vs_Gut)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10035_length_2011_cov_7.68814_g7072_i0 31.4974643.5491891.11161323.1856211.444434e-031.694083e-02
NODE_10049_length_2009_cov_1010.67_g4245_i1 144.9421193.7802060.80275684.7029332.564508e-066.123575e-05
NODE_10092_length_2003_cov_126.275_g7115_i0 87.9102264.2261310.90710644.6530013.271381e-067.653491e-05
NODE_10106_length_2001_cov_20.3505_g7125_i0 32.4979694.0061901.26287373.1902461.421519e-031.671594e-02
NODE_10108_length_2001_cov_9.98304_g7126_i0 28.8463754.1983820.76851695.4217095.903195e-081.951548e-06
NODE_101175_length_138_cov_7.48193_g93955_i0 3.9239754.3389581.19206673.5540693.793190e-045.366714e-03
In [12]:
colnames(sigOE_UPREGULATED_logfold_Bio_Upper_Lip_Vs_Gut)[1] <- "transcript_id"
In [13]:
#add annotation 
sigOE_UPREGULATED_logfold_Bio_Upper_Lip_Vs_Gut_annot <- left_join(sigOE_UPREGULATED_logfold_Bio_Upper_Lip_Vs_Gut,Trinotate_lym_subset,by="transcript_id")
In [14]:
head(sigOE_UPREGULATED_logfold_Bio_Upper_Lip_Vs_Gut_annot)
A tibble: 6 × 22
transcript_idbaseMeanlog2FoldChangelfcSEstatpvaluepadj#gene_idsprot_Top_BLASTX_hitRNAMMERsprot_Top_BLASTP_hitPfamSignalPTmHMMeggnogKegggene_ontology_blastgene_ontology_pfamtranscriptpeptide
<chr><dbl><dbl><dbl><dbl><dbl><dbl><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr>
NODE_10035_length_2011_cov_7.68814_g7072_i0 31.4974643.5491891.11161323.1856211.444434e-031.694083e-02g7072 . .. . . .. . . . ..
NODE_10049_length_2009_cov_1010.67_g4245_i1 144.9421193.7802060.80275684.7029332.564508e-066.123575e-05g4245 ATS16_MOUSE^ATS16_MOUSE^Q:1751-417,H:93-572^25.662%ID^E:3.7e-36^RecName: Full=A disintegrin and metalloproteinase with thrombospondin motifs 16;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus.ADT1_CAEEL^ADT1_CAEEL^Q:53-403,H:141-533^27.114%ID^E:4.13e-35^RecName: Full=A disintegrin and metalloproteinase with thrombospondin motifs adt-1 {ECO:0000305};^Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida; Rhabditina; Rhabditomorpha; Rhabditoidea; Rhabditidae; Peloderinae; Caenorhabditis PF13688.6^Reprolysin_5^Metallo-peptidase family M12^131-300^E:5.7e-12`PF01421.19^Reprolysin^Reprolysin (M12B) family zinc metalloprotease^135-323^E:1.9e-15`PF13582.6^Reprolysin_3^Metallo-peptidase family M12B Reprolysin-like^142-274^E:4.1e-09`PF13583.6^Reprolysin_4^Metallo-peptidase family M12B Reprolysin-like^200-286^E:3.1e-06`PF13574.6^Reprolysin_2^Metallo-peptidase family M12B Reprolysin-like^219-311^E:3.9e-08`PF17771.1^ADAM_CR_2^ADAM cysteine-rich domain^339-403^E:1.2e-09`PF17771.1^ADAM_CR_2^ADAM cysteine-rich domain^430-498^E:5.6e-05. .ENOG41104P0^Thrombospondin type 1 domainKEGG:cel:CELE_C02B4.1 GO:0005576^cellular_component^extracellular region`GO:0046872^molecular_function^metal ion binding`GO:0004222^molecular_function^metalloendopeptidase activity GO:0004222^molecular_function^metalloendopeptidase activity`GO:0006508^biological_process^proteolysis..
NODE_10092_length_2003_cov_126.275_g7115_i0 87.9102264.2261310.90710644.6530013.271381e-067.653491e-05g7115 . .LRP1_CHICK^LRP1_CHICK^Q:140-221,H:978-1057^56.098%ID^E:4.41e-19^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:137-220,H:852-936^47.059%ID^E:1.71e-16^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:137-208,H:3572-3641^52.778%ID^E:7.37e-15^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:137-208,H:3491-3564^55.405%ID^E:1.68e-14^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:118-220,H:3514-3624^41.071%ID^E:3.52e-14^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:137-208,H:934-1006^53.425%ID^E:1.45e-13^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:137-224,H:3610-3697^44.944%ID^E:2.73e-12^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:137-209,H:893-966^48.649%ID^E:3.5e-12^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:136-208,H:3330-3402^49.315%ID^E:5.14e-12^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:137-220,H:3450-3535^45.349%ID^E:1.51e-11^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:137-212,H:2732-2808^51.282%ID^E:1.73e-11^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:137-208,H:1013-1092^48.75%ID^E:3.75e-11^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:137-209,H:2690-2764^45.57%ID^E:1.5e-10^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:128-220,H:3363-3452^40.206%ID^E:2.28e-09^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:137-238,H:29-146^36.975%ID^E:1.22e-08^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:137-208,H:2856-2933^45%ID^E:2.14e-08^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:137-208,H:3410-3482^50%ID^E:2.73e-08^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:142-242,H:3698-3804^38.532%ID^E:4.25e-07^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:141-245,H:1107-1222^35.537%ID^E:1.24e-06^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus`LRP1_CHICK^LRP1_CHICK^Q:137-208,H:2560-2630^42.466%ID^E:2.15e-06^RecName: Full=Low-density lipoprotein receptor-related protein 1;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; GallusPF00057.18^Ldl_recept_a^Low-density lipoprotein receptor domain class A^136-170^E:2e-09`PF00057.18^Ldl_recept_a^Low-density lipoprotein receptor domain class A^176-209^E:2.5e-11`PF00059.21^Lectin_C^Lectin C-type domain^486-595^E:5.4e-13 sigP:1^23^0.647^YES.. KEGG:gga:396170`KO:K04550GO:0005905^cellular_component^clathrin-coated pit`GO:0016021^cellular_component^integral component of membrane`GO:0016964^molecular_function^alpha-2 macroglobulin receptor activity`GO:0005509^molecular_function^calcium ion binding GO:0005515^molecular_function^protein binding ..
NODE_10106_length_2001_cov_20.3505_g7125_i0 32.4979694.0061901.26287373.1902461.421519e-031.671594e-02g7125 . .. . . .. . . . ..
NODE_10108_length_2001_cov_9.98304_g7126_i0 28.8463754.1983820.76851695.4217095.903195e-081.951548e-06g7126 MRC2_HUMAN^MRC2_HUMAN^Q:318-1031,H:684-948^22.101%ID^E:1.19e-08^RecName: Full=C-type mannose receptor 2;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo .MRC2_HUMAN^MRC2_HUMAN^Q:49-286,H:684-948^22.101%ID^E:9.87e-10^RecName: Full=C-type mannose receptor 2;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo PF00059.21^Lectin_C^Lectin C-type domain^62-167^E:1.4e-14`PF00059.21^Lectin_C^Lectin C-type domain^184-287^E:2.8e-08 sigP:1^23^0.485^YES.ENOG410XQ89^Mannose receptor, C type 2 KEGG:hsa:9902`KO:K06560 GO:0005925^cellular_component^focal adhesion`GO:0016021^cellular_component^integral component of membrane`GO:0016020^cellular_component^membrane`GO:0030246^molecular_function^carbohydrate binding`GO:0005518^molecular_function^collagen binding`GO:0004888^molecular_function^transmembrane signaling receptor activity`GO:0030574^biological_process^collagen catabolic process`GO:0006897^biological_process^endocytosis`GO:0001649^biological_process^osteoblast differentiation. ..
NODE_101175_length_138_cov_7.48193_g93955_i0 3.9239754.3389581.19206673.5540693.793190e-045.366714e-03g93955. .. . . .. . . . ..
In [15]:
#extract all genes that are significantly upregulated in the Gut (negative log2fold change) but that are downregulated in the bioluminescent upper lip. 
sigOE_DOWNREGULATED_logfold_Bio_Upper_Lip_Vs_Gut <- sigOE_Bio_Upper_Lip_Vs_Gut %>%
        filter(padj < padj.cutoff & log2FoldChange < lfc.cutoff)
In [16]:
colnames(sigOE_DOWNREGULATED_logfold_Bio_Upper_Lip_Vs_Gut)[1]<- "transcript_id"
In [17]:
#add annotation
sigOE_DOWNREGULATED_logfold_Bio_Upper_Lip_Vs_Gut_annot <- left_join(sigOE_DOWNREGULATED_logfold_Bio_Upper_Lip_Vs_Gut, Trinotate_lym_subset,by="transcript_id")
In [5]:
#save these two dataframes for downstream analysis in Section 9.3

#genes that are significantly upregulated in the bioluminescent upper lip (positive log2 fold change)
head(sigOE_UPREGULATED_logfold_Bio_Upper_Lip_Vs_Gut)

#genes that are significantly upregulated in the gut (negative log2fold change) but that are downregulated in the bioluminescent upper lip. 
head(sigOE_DOWNREGULATED_logfold_Bio_Upper_Lip_Vs_Gut_annot)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10035_length_2011_cov_7.68814_g7072_i0 31.4974643.5491891.11161323.1856211.444434e-031.694083e-02
NODE_10049_length_2009_cov_1010.67_g4245_i1 144.9421193.7802060.80275684.7029332.564508e-066.123575e-05
NODE_10092_length_2003_cov_126.275_g7115_i0 87.9102264.2261310.90710644.6530013.271381e-067.653491e-05
NODE_10106_length_2001_cov_20.3505_g7125_i0 32.4979694.0061901.26287373.1902461.421519e-031.671594e-02
NODE_10108_length_2001_cov_9.98304_g7126_i0 28.8463754.1983820.76851695.4217095.903195e-081.951548e-06
NODE_101175_length_138_cov_7.48193_g93955_i0 3.9239754.3389581.19206673.5540693.793190e-045.366714e-03
A tibble: 6 × 22
transcript_idbaseMeanlog2FoldChangelfcSEstatpvaluepadj#gene_idsprot_Top_BLASTX_hitRNAMMERsprot_Top_BLASTP_hitPfamSignalPTmHMMeggnogKegggene_ontology_blastgene_ontology_pfamtranscriptpeptide
<chr><dbl><dbl><dbl><dbl><dbl><dbl><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr>
NODE_10015_length_2014_cov_682.243_g7060_i0 570.345545-6.4174320.9300257-6.8644086.676754e-124.486477e-10g7060 YM9I_CAEEL^YM9I_CAEEL^Q:165-1547,H:47-510^35.789%ID^E:1.6e-90^RecName: Full=Putative serine protease F56F10.1;^Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida; Rhabditina; Rhabditomorpha; Rhabditoidea; Rhabditidae; Peloderinae; Caenorhabditis .YM9I_CAEEL^YM9I_CAEEL^Q:35-495,H:47-510^35.789%ID^E:7.25e-93^RecName: Full=Putative serine protease F56F10.1;^Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida; Rhabditina; Rhabditomorpha; Rhabditoidea; Rhabditidae; Peloderinae; Caenorhabditis PF05577.12^Peptidase_S28^Serine carboxypeptidase S28^52-486^E:8e-146 sigP:1^21^0.762^YES. ENOG410XSGG^protease, serine, 16 (thymus)KEGG:cel:CELE_F56F10.1 GO:0045121^cellular_component^membrane raft`GO:0008239^molecular_function^dipeptidyl-peptidase activity`GO:0008236^molecular_function^serine-type peptidase activity`GO:0045087^biological_process^innate immune response`GO:0006508^biological_process^proteolysis GO:0008236^molecular_function^serine-type peptidase activity`GO:0006508^biological_process^proteolysis ..
NODE_10040_length_2011_cov_2.41973_g7075_i0 14.886486-3.4248021.0781429-3.1345561.721142e-031.958089e-02g7075 TSN9_DANRE^TSN9_DANRE^Q:157-810,H:8-230^23.556%ID^E:7.86e-12^RecName: Full=Tetraspanin-9;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Actinopterygii; Neopterygii; Teleostei; Ostariophysi; Cypriniformes; Cyprinidae; Danio .TSN9_DANRE^TSN9_DANRE^Q:12-229,H:8-230^26.222%ID^E:3.91e-25^RecName: Full=Tetraspanin-9;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Actinopterygii; Neopterygii; Teleostei; Ostariophysi; Cypriniformes; Cyprinidae; Danio PF00335.20^Tetraspanin^Tetraspanin family^13-226^E:3.8e-43 . ExpAA=90.13^PredHel=4^Topology=i17-39o59-81i88-110o203-225i ENOG4111IRY^tetraspanin KEGG:dre:431733`KO:K17350GO:0005887^cellular_component^integral component of plasma membrane GO:0016021^cellular_component^integral component of membrane ..
NODE_100534_length_140_cov_1.47059_g93314_i0 21.181026-5.4347731.2391311-4.2437082.198567e-054.265253e-04g93314. .. . . . . . . . ..
NODE_10082_length_2005_cov_3.57487_g6803_i1 13.210637-2.1159220.6939908-3.0429012.343094e-032.522919e-02g6803 S35F6_HUMAN^S35F6_HUMAN^Q:12-980,H:18-336^48.457%ID^E:1.23e-96^RecName: Full=Solute carrier family 35 member F6;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo .S35F6_HUMAN^S35F6_HUMAN^Q:4-326,H:18-336^48.457%ID^E:5.73e-106^RecName: Full=Solute carrier family 35 member F6;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo PF08449.11^UAA^UAA transporter family^34-325^E:1e-08`PF08627.10^CRT-like^CRT-like, chloroquine-resistance transporter-like^40-227^E:1.1e-09`PF06027.12^SLC35F^Solute carrier family 35^62-214^E:5.6e-11`PF00892.20^EamA^EamA-like transporter family^85-146^E:5.3e-08 . ExpAA=184.31^PredHel=9^Topology=o33-55i75-97o107-124i131-153o163-185i205-224o251-270i291-305o310-332iCOG0697^membrane KEGG:hsa:54978 GO:0005829^cellular_component^cytosol`GO:0070062^cellular_component^extracellular exosome`GO:0016021^cellular_component^integral component of membrane`GO:0043231^cellular_component^intracellular membrane-bounded organelle`GO:0005765^cellular_component^lysosomal membrane`GO:0005739^cellular_component^mitochondrion`GO:0005654^cellular_component^nucleoplasm`GO:0022857^molecular_function^transmembrane transporter activity`GO:1901029^biological_process^negative regulation of mitochondrial outer membrane permeabilization involved in apoptotic signaling pathway`GO:0008284^biological_process^positive regulation of cell population proliferationGO:0055085^biological_process^transmembrane transport`GO:0022857^molecular_function^transmembrane transporter activity`GO:0016021^cellular_component^integral component of membrane`GO:0016020^cellular_component^membrane..
NODE_10087_length_2004_cov_52.6834_g7112_i0 1.431373-4.0181871.3207673-3.0097002.615061e-032.759244e-02g7112 FAS2_SCHAM^FAS2_SCHAM^Q:76-1557,H:8-555^28.905%ID^E:3.41e-63^RecName: Full=Fasciclin-2;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Polyneoptera; Orthoptera; Caelifera; Acrididea; Acridomorpha; Acridoidea; Acrididae; Cyrtacanthacridinae; Schistocerca .FAS2_SCHAM^FAS2_SCHAM^Q:15-508,H:8-555^28.905%ID^E:6.64e-64^RecName: Full=Fasciclin-2;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Polyneoptera; Orthoptera; Caelifera; Acrididea; Acridomorpha; Acridoidea; Acrididae; Cyrtacanthacridinae; Schistocerca PF07679.16^I-set^Immunoglobulin I-set domain^138-208^E:4.7e-06`PF13927.6^Ig_3^Immunoglobulin domain^285-368^E:5.6e-14`PF13895.6^Ig_2^Immunoglobulin domain^286-381^E:1.7e-07`PF07679.16^I-set^Immunoglobulin I-set domain^291-381^E:8.7e-14`PF00047.25^ig^Immunoglobulin domain^291-375^E:1e-09`PF13927.6^Ig_3^Immunoglobulin domain^395-466^E:2.7e-09`PF07679.16^I-set^Immunoglobulin I-set domain^395-476^E:1.2e-07`PF00047.25^ig^Immunoglobulin domain^397-475^E:1.5e-07`PF00057.18^Ldl_recept_a^Low-density lipoprotein receptor domain class A^497-532^E:5.9e-12`PF00057.18^Ldl_recept_a^Low-density lipoprotein receptor domain class A^537-572^E:1e-11sigP:1^25^0.734^YES. . . GO:0016021^cellular_component^integral component of membrane`GO:0007155^biological_process^cell adhesion`GO:0030154^biological_process^cell differentiation`GO:0007399^biological_process^nervous system development GO:0005515^molecular_function^protein binding ..
NODE_10088_length_2004_cov_27.3961_g7113_i0 61.316935-7.1299261.0023331-6.8501827.375626e-124.919102e-10g7113 SANT_PLAFF^SANT_PLAFF^Q:33-527,H:82-243^28.485%ID^E:8.91e-16^RecName: Full=S-antigen protein;^Eukaryota; Alveolata; Apicomplexa; Aconoidasida; Haemosporida; Plasmodiidae; Plasmodium; Plasmodium (Laverania)`SANT_PLAFF^SANT_PLAFF^Q:132-605,H:82-239^30.38%ID^E:9.7e-16^RecName: Full=S-antigen protein;^Eukaryota; Alveolata; Apicomplexa; Aconoidasida; Haemosporida; Plasmodiidae; Plasmodium; Plasmodium (Laverania)`SANT_PLAFF^SANT_PLAFF^Q:66-539,H:82-239^29.747%ID^E:3.17e-15^RecName: Full=S-antigen protein;^Eukaryota; Alveolata; Apicomplexa; Aconoidasida; Haemosporida; Plasmodiidae; Plasmodium; Plasmodium (Laverania)`SANT_PLAFF^SANT_PLAFF^Q:165-638,H:82-239^29.747%ID^E:3.17e-15^RecName: Full=S-antigen protein;^Eukaryota; Alveolata; Apicomplexa; Aconoidasida; Haemosporida; Plasmodiidae; Plasmodium; Plasmodium (Laverania)`SANT_PLAFF^SANT_PLAFF^Q:6-473,H:84-239^28.846%ID^E:3.17e-15^RecName: Full=S-antigen protein;^Eukaryota; Alveolata; Apicomplexa; Aconoidasida; Haemosporida; Plasmodiidae; Plasmodium; Plasmodium (Laverania)`SANT_PLAFF^SANT_PLAFF^Q:99-572,H:82-239^29.747%ID^E:3.54e-15^RecName: Full=S-antigen protein;^Eukaryota; Alveolata; Apicomplexa; Aconoidasida; Haemosporida; Plasmodiidae; Plasmodium; Plasmodium (Laverania)`SANT_PLAFF^SANT_PLAFF^Q:3-407,H:105-239^28.889%ID^E:2.64e-12^RecName: Full=S-antigen protein;^Eukaryota; Alveolata; Apicomplexa; Aconoidasida; Haemosporida; Plasmodiidae; Plasmodium; Plasmodium (Laverania).SANT_PLAFF^SANT_PLAFF^Q:44-201,H:82-239^30.38%ID^E:4.33e-16^RecName: Full=S-antigen protein;^Eukaryota; Alveolata; Apicomplexa; Aconoidasida; Haemosporida; Plasmodiidae; Plasmodium; Plasmodium (Laverania)`SANT_PLAFF^SANT_PLAFF^Q:11-175,H:82-243^28.485%ID^E:6.49e-16^RecName: Full=S-antigen protein;^Eukaryota; Alveolata; Apicomplexa; Aconoidasida; Haemosporida; Plasmodiidae; Plasmodium; Plasmodium (Laverania)`SANT_PLAFF^SANT_PLAFF^Q:55-212,H:82-239^29.747%ID^E:1.03e-15^RecName: Full=S-antigen protein;^Eukaryota; Alveolata; Apicomplexa; Aconoidasida; Haemosporida; Plasmodiidae; Plasmodium; Plasmodium (Laverania)`SANT_PLAFF^SANT_PLAFF^Q:22-179,H:82-239^29.747%ID^E:1.34e-15^RecName: Full=S-antigen protein;^Eukaryota; Alveolata; Apicomplexa; Aconoidasida; Haemosporida; Plasmodiidae; Plasmodium; Plasmodium (Laverania)`SANT_PLAFF^SANT_PLAFF^Q:33-190,H:82-239^29.747%ID^E:1.5e-15^RecName: Full=S-antigen protein;^Eukaryota; Alveolata; Apicomplexa; Aconoidasida; Haemosporida; Plasmodiidae; Plasmodium; Plasmodium (Laverania)`SANT_PLAFF^SANT_PLAFF^Q:2-157,H:84-239^28.846%ID^E:1.95e-15^RecName: Full=S-antigen protein;^Eukaryota; Alveolata; Apicomplexa; Aconoidasida; Haemosporida; Plasmodiidae; Plasmodium; Plasmodium (Laverania)`SANT_PLAFF^SANT_PLAFF^Q:1-135,H:105-239^28.889%ID^E:1.28e-12^RecName: Full=S-antigen protein;^Eukaryota; Alveolata; Apicomplexa; Aconoidasida; Haemosporida; Plasmodiidae; Plasmodium; Plasmodium (Laverania)PF03318.13^ETX_MTX2^Clostridium epsilon toxin ETX/Bacillus mosquitocidal toxin MTX2^366-480^E:0.00011 . . . . GO:0020003^cellular_component^symbiont-containing vacuole . ..

Determine tissue-specific expression

To identify tissue-specific differential expression (i.e., significantly upregulated genes that are uniquely expressed), each tissue was compared to the other two and tissue-specific genes were extracted from the intersection of the Venn diagram (in Section 9.4.4). For example, the expression in the bioluminescent upper lip was determined by comparing it to both the compound eye and gut tissues.

Bioluminescent upper lip

In [13]:
#create a dataframe with all significantly upregulated genes of the bioluminescent upper lip. 
#merge dataframes that have significantly upregulated genes of the bioluminescent upper lip from pairwise comparisons - Bio Upper Lip vs Gut and Bio Upper Lip vs Eye. 


head(sigOE_UPREGULATED_logfold_Bio_Upper_Lip_Vs_Gut) 

head(sigOE_UPREGULATED_logfold_Bio_UpperLip_Vs_Eye)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10035_length_2011_cov_7.68814_g7072_i0 31.4974643.5491891.11161323.1856211.444434e-031.694083e-02
NODE_10049_length_2009_cov_1010.67_g4245_i1 144.9421193.7802060.80275684.7029332.564508e-066.123575e-05
NODE_10092_length_2003_cov_126.275_g7115_i0 87.9102264.2261310.90710644.6530013.271381e-067.653491e-05
NODE_10106_length_2001_cov_20.3505_g7125_i0 32.4979694.0061901.26287373.1902461.421519e-031.671594e-02
NODE_10108_length_2001_cov_9.98304_g7126_i0 28.8463754.1983820.76851695.4217095.903195e-081.951548e-06
NODE_101175_length_138_cov_7.48193_g93955_i0 3.9239754.3389581.19206673.5540693.793190e-045.366714e-03
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10010_length_2015_cov_7.32245_g7057_i0 7.4816914.2448361.20231983.5702753.566072e-040.0130704556
NODE_10049_length_2009_cov_1010.67_g4245_i1144.9421193.6968330.81408524.5291835.921214e-060.0004848792
NODE_10075_length_2006_cov_50.2255_g7102_i0 47.5412142.1079240.68383913.0815622.059176e-030.0439239389
NODE_10110_length_2001_cov_7.02312_g7128_i0 10.2969264.4313081.02219654.3947011.109251e-050.0008282910
NODE_10314_length_1973_cov_79.1867_g7270_i0 51.5047331.7009030.51996233.2693441.077972e-030.0286489771
NODE_1031_length_4873_cov_1.60834_g724_i0 14.3213164.1184641.04671623.9491327.843501e-050.0040845539
In [21]:
colnames(sigOE_UPREGULATED_logfold_Bio_Upper_Lip_Vs_Gut)[1] <- "gene"
In [22]:
colnames(sigOE_UPREGULATED_logfold_Bio_UpperLip_Vs_Eye)[1] <- "gene"
In [23]:
#merge
merged_upper_lips_df <- rbind(
  sigOE_UPREGULATED_logfold_Bio_Upper_Lip_Vs_Gut,
  sigOE_UPREGULATED_logfold_Bio_UpperLip_Vs_Eye
)
In [24]:
head(merged_upper_lips_df)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10035_length_2011_cov_7.68814_g7072_i0 31.4974643.5491891.11161323.1856211.444434e-031.694083e-02
NODE_10049_length_2009_cov_1010.67_g4245_i1 144.9421193.7802060.80275684.7029332.564508e-066.123575e-05
NODE_10092_length_2003_cov_126.275_g7115_i0 87.9102264.2261310.90710644.6530013.271381e-067.653491e-05
NODE_10106_length_2001_cov_20.3505_g7125_i0 32.4979694.0061901.26287373.1902461.421519e-031.671594e-02
NODE_10108_length_2001_cov_9.98304_g7126_i0 28.8463754.1983820.76851695.4217095.903195e-081.951548e-06
NODE_101175_length_138_cov_7.48193_g93955_i0 3.9239754.3389581.19206673.5540693.793190e-045.366714e-03
In [25]:
# The same gene can be found in both Bio Upper Lip vs Gut and Bio Upper Lip vs Eye. Remove gene duplicates while retaining one duplicate. 
merged_upper_lips_unique <- merged_upper_lips_df[!duplicated(merged_upper_lips_df$gene), ]
In [26]:
head(merged_upper_lips_unique)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10035_length_2011_cov_7.68814_g7072_i0 31.4974643.5491891.11161323.1856211.444434e-031.694083e-02
NODE_10049_length_2009_cov_1010.67_g4245_i1 144.9421193.7802060.80275684.7029332.564508e-066.123575e-05
NODE_10092_length_2003_cov_126.275_g7115_i0 87.9102264.2261310.90710644.6530013.271381e-067.653491e-05
NODE_10106_length_2001_cov_20.3505_g7125_i0 32.4979694.0061901.26287373.1902461.421519e-031.671594e-02
NODE_10108_length_2001_cov_9.98304_g7126_i0 28.8463754.1983820.76851695.4217095.903195e-081.951548e-06
NODE_101175_length_138_cov_7.48193_g93955_i0 3.9239754.3389581.19206673.5540693.793190e-045.366714e-03

Compound eye

In [14]:
#compound eye
#create a dataframe with all significantly upregulated genes of the compound eye.  
#merge dataframes that have significantly upregulated genes of the compound eye from pairwise comparisons - Bio Upper Lip vs Eye and Gut vs Eye. 


head(sigOE_DOWNREGULATED_logfold_Bio_UpperLip_Vs_Eye)

head(sigOE_DOWNREGULATED_logfold_Gut_vs_Eye)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10126_length_1999_cov_2.60802_g7138_i0 15.963167-3.4422930.9226241 -3.7048882.114840e-048.941592e-03
NODE_101633_length_138_cov_2.06024_g94413_i0 7.667808-6.1495541.0781402 -5.5163643.460855e-084.918433e-06
NODE_10218_length_1985_cov_18.5554_g7198_i0 61.993605-6.0859230.6771456 -8.8415809.437437e-193.922413e-16
NODE_1021_length_4884_cov_10.72_g718_i0 45.575618-6.5144800.8450182 -7.5173235.590900e-141.734596e-11
NODE_10232_length_1983_cov_42.0814_g7210_i0 434.539937-9.2868110.7189709-12.6132621.784501e-366.551496e-33
NODE_102471_length_136_cov_2.41975_g95251_i0 3.353358-4.7867481.3410759 -3.5717663.545819e-041.301788e-02
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10030_length_2012_cov_88.44_g145_i5 77.516326-2.0043520.7483891-2.6772667.422565e-034.264394e-02
NODE_1009_length_4897_cov_2.69228_g709_i0 18.842542-1.1065160.4161180-2.6597357.820226e-034.417345e-02
NODE_10106_length_2001_cov_20.3505_g7125_i0 32.497969-3.7863031.2661859-3.0125302.590799e-031.957288e-02
NODE_10108_length_2001_cov_9.98304_g7126_i0 28.846375-4.7706260.7722330-6.1236489.145690e-105.682196e-08
NODE_101175_length_138_cov_7.48193_g93955_i0 3.923975-3.8243711.2262699-3.0757522.099721e-031.670540e-02
NODE_10126_length_1999_cov_2.60802_g7138_i0 15.963167-5.2191880.9667613-5.3075571.111044e-073.921827e-06
In [29]:
colnames(sigOE_DOWNREGULATED_logfold_Bio_UpperLip_Vs_Eye)[1]<- "gene"
In [30]:
#merged eye
merged_Eye_df <- rbind(sigOE_DOWNREGULATED_logfold_Bio_UpperLip_Vs_Eye , sigOE_DOWNREGULATED_logfold_Gut_vs_Eye)
In [31]:
#The same gene can be found in both Bio Upper Lip vs Eye and Gut vs Eye. Remove gene duplicates while retaining one duplicate. 
merged_Eye_unique <-merged_Eye_df[!duplicated(merged_Eye_df$gene), ]
In [32]:
head(merged_Eye_unique)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10126_length_1999_cov_2.60802_g7138_i0 15.963167-3.4422930.9226241 -3.7048882.114840e-048.941592e-03
NODE_101633_length_138_cov_2.06024_g94413_i0 7.667808-6.1495541.0781402 -5.5163643.460855e-084.918433e-06
NODE_10218_length_1985_cov_18.5554_g7198_i0 61.993605-6.0859230.6771456 -8.8415809.437437e-193.922413e-16
NODE_1021_length_4884_cov_10.72_g718_i0 45.575618-6.5144800.8450182 -7.5173235.590900e-141.734596e-11
NODE_10232_length_1983_cov_42.0814_g7210_i0 434.539937-9.2868110.7189709-12.6132621.784501e-366.551496e-33
NODE_102471_length_136_cov_2.41975_g95251_i0 3.353358-4.7867481.3410759 -3.5717663.545819e-041.301788e-02

Gut

In [15]:
#create a dataframe with all significantly upregulated genes of the gut.  
#merge dataframes that have significantly upregulated genes of the gut from pairwise comparisons - Gut vs Eye and Bio Upper Lip vs Gut. 

head(sigOE_UPREGULATED_logfold_Gut_vs_Eye)

head(sigOE_DOWNREGULATED_logfold_Bio_Upper_Lip_Vs_Gut)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10010_length_2015_cov_7.32245_g7057_i0 7.4816914.3290051.19867233.6464322.659066e-043.236611e-03
NODE_10015_length_2014_cov_682.243_g7060_i0 570.3455457.1526040.95577687.4040691.320736e-132.149895e-11
NODE_10040_length_2011_cov_2.41973_g7075_i0 14.8864865.9607081.25773134.6833272.822558e-066.652356e-05
NODE_10047_length_2010_cov_1.26305_g7081_i0 2.1085493.0922971.11612362.7754705.512192e-033.450172e-02
NODE_100534_length_140_cov_1.47059_g93314_i0 21.1810266.4151441.33791754.6755692.931403e-066.869105e-05
NODE_1006_length_4905_cov_1.0567_g707_i0 2.2374503.4829541.22488122.8148334.880265e-033.151585e-02
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10015_length_2014_cov_682.243_g7060_i0 570.345545-6.4174320.9300257-6.8644086.676754e-124.486477e-10
NODE_10040_length_2011_cov_2.41973_g7075_i0 14.886486-3.4248021.0781429-3.1345561.721142e-031.958089e-02
NODE_100534_length_140_cov_1.47059_g93314_i0 21.181026-5.4347731.2391311-4.2437082.198567e-054.265253e-04
NODE_10082_length_2005_cov_3.57487_g6803_i1 13.210637-2.1159220.6939908-3.0429012.343094e-032.522919e-02
NODE_10087_length_2004_cov_52.6834_g7112_i0 1.431373-4.0181871.3207673-3.0097002.615061e-032.759244e-02
NODE_10088_length_2004_cov_27.3961_g7113_i0 61.316935-7.1299261.0023331-6.8501827.375626e-124.919102e-10
In [35]:
colnames(sigOE_UPREGULATED_logfold_Gut_vs_Eye)[1] <- "gene"
In [36]:
colnames(sigOE_DOWNREGULATED_logfold_Bio_Upper_Lip_Vs_Gut)[1]<- "gene"
In [37]:
#merge
merged_Gut_df <- rbind(sigOE_UPREGULATED_logfold_Gut_vs_Eye , sigOE_DOWNREGULATED_logfold_Bio_Upper_Lip_Vs_Gut)

#The same gene can be found in both Gut vs Eye and Bio Upper Lip vs Eye. Remove gene duplicates while retaining one duplicate.
merged_Gut_unique <-merged_Gut_df[!duplicated(merged_Gut_df$gene), ]
In [38]:
head(merged_Gut_unique)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10010_length_2015_cov_7.32245_g7057_i0 7.4816914.3290051.19867233.6464322.659066e-043.236611e-03
NODE_10015_length_2014_cov_682.243_g7060_i0 570.3455457.1526040.95577687.4040691.320736e-132.149895e-11
NODE_10040_length_2011_cov_2.41973_g7075_i0 14.8864865.9607081.25773134.6833272.822558e-066.652356e-05
NODE_10047_length_2010_cov_1.26305_g7081_i0 2.1085493.0922971.11612362.7754705.512192e-033.450172e-02
NODE_100534_length_140_cov_1.47059_g93314_i0 21.1810266.4151441.33791754.6755692.931403e-066.869105e-05
NODE_1006_length_4905_cov_1.0567_g707_i0 2.2374503.4829541.22488122.8148334.880265e-033.151585e-02

Venn Diagram - Extract tissue-specific genes

In [40]:
#generate a venn diagram to visualize shared significantly upregulated genes across tissue types and extract the genes that are unique to each tissue type. 
unique_venn_list <- list(
  Bio_Upper_Lip = merged_upper_lips_unique$gene  , 
  Gut = merged_Gut_unique$gene,
  Compound_Eye = merged_Eye_unique$gene
)

ggvenn_unique <- ggvenn(
  unique_venn_list, 
  fill_color = c('#AC97C9','#C97D97', '#F2C93D'),
  stroke_size = .7, set_name_size = 6, text_size = 5
)

ggvenn_unique
In [3]:
# Open a PDF device
pdf("Luminous_DGE.pdf", width = 8, height = 6)


ggvenn_unique 

dev.off()
png: 2
In [41]:
#prep dataframes for extraction
Bio_Upper_Lip <- as.data.frame(merged_upper_lips_unique$gene)
colnames(Bio_Upper_Lip)[1]<- "gene"
Gut <- as.data.frame(merged_Gut_unique$gene)
colnames(Gut)[1]<- "gene"
Compound_eye <- as.data.frame(merged_Eye_unique$gene)
colnames(Compound_eye)[1]<- "gene"
In [42]:
# compare and extract unique genes for each tissue type
unique_genes_bio_upper_lip <- anti_join(Bio_Upper_Lip, Gut, by = "gene") %>%
  anti_join(Compound_eye, by = "gene")

unique_genes_gut <- anti_join(Gut, Bio_Upper_Lip, by = "gene") %>%
  anti_join(Compound_eye, by = "gene")

unique_genes_compound_eye <- anti_join(Compound_eye, Bio_Upper_Lip, by = "gene") %>%
  anti_join(Gut, by = "gene")
In [43]:
nrow(unique_genes_bio_upper_lip)
595
In [44]:
nrow(unique_genes_gut)
2534
In [45]:
nrow(unique_genes_compound_eye)
1144
In [46]:
#add the annotations back to the unique genes in each tissue type by subsetting
In [47]:
unique_genes_bio_upper_lip_info  <- merged_upper_lips_unique %>%
  filter(gene %in% unique_genes_bio_upper_lip$gene)
In [48]:
nrow(unique_genes_bio_upper_lip_info)
595
In [49]:
unique_genes_eye_info  <- merged_Eye_unique %>%
  filter(gene %in% unique_genes_compound_eye$gene)
In [50]:
nrow(unique_genes_eye_info)
1144
In [51]:
unique_genes_gut_info  <- merged_Gut_unique %>%
  filter(gene %in% unique_genes_gut$gene)
In [97]:
nrow(unique_genes_gut_info)
2534
In [52]:
#add annotations back 
colnames(unique_genes_bio_upper_lip_info)[1]<- "transcript_id"
unique_genes_bio_upper_lip_info_annot <- left_join(unique_genes_bio_upper_lip_info,Trinotate_lym_subset,by="transcript_id")
In [53]:
#add annotations back 
colnames(unique_genes_eye_info)[1]<- "transcript_id"
unique_genes_eye_info_annot <- left_join(unique_genes_eye_info,Trinotate_lym_subset,by="transcript_id")
In [55]:
#add annotations backs
colnames(unique_genes_gut_info)[1] <- "transcript_id"
unique_genes_gut_info_annot <- left_join(unique_genes_gut_info,Trinotate_lym_subset,by="transcript_id")
In [107]:
#write.csv(unique_genes_bio_upper_lip_info_annot, file = "df_Vtsujii_sigfig_upreg_unique_BioUpperLip.csv")
In [108]:
#write.csv(unique_genes_eye_info_annot, file = "df_Vtsujii_sigfig_upreg_unique_comEye.csv")
In [109]:
#write.csv(unique_genes_gut_info_annot, file = "df_Vtsujii_sigfig_upreg_unique_Gut.csv")

DGE - GO enrichment analyses for bioluminescent upper lip

The topGO package was used to perform GO enrichment analysis on significantly upregulated genes (i.e., uniquely expressed) in the bioluminescent upper lip. (Alexa Rahnenfuhrer, 2024). The figures representing the GO enrichments presented here were utilized for analysis, but were not included in the main text of the publication. The GO enrichments were visualized with GO-Figure! package for publication (Reijnders and Waterhouse, 2021). GO-Figure! reference https://github.com/lmesrop/BCN_publication/tree/main/Go-Figure!.

In [62]:
#import the trinotate go sheet from Trinotate output 
geneID2GO_bio <- readMappings(file ="Trinotate_go_lym.txt")
In [65]:
#significantly upregulated genes (i.e. expressed uniquely) in the bioluminescent upper lip 

head(unique_genes_bio_upper_lip_info)
A tibble: 6 × 7
transcript_idbaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
NODE_10035_length_2011_cov_7.68814_g7072_i0 31.4974643.5491891.11161323.1856211.444434e-031.694083e-02
NODE_10049_length_2009_cov_1010.67_g4245_i1 144.9421193.7802060.80275684.7029332.564508e-066.123575e-05
NODE_10092_length_2003_cov_126.275_g7115_i0 87.9102264.2261310.90710644.6530013.271381e-067.653491e-05
NODE_10123_length_1999_cov_24.5818_g6951_i1 12.9709451.9215760.68238472.8150524.876929e-034.595970e-02
NODE_10147_length_1995_cov_11.3794_g7152_i0 7.9424923.9612351.16034613.3728717.438888e-049.602505e-03
NODE_102011_length_137_cov_2.19512_g94791_i0 83.5905496.0765691.69826373.9424798.064363e-051.377157e-03
In [134]:
#save the transcript ids of all the annotated genes under geneNames object 
geneNames_bio<- as.character(Trinotate_lym_subset$transcript_id)
In [27]:
#save the transcript 
myInterestingGenes_bio= as.character(unique_genes_bio_upper_lip_info$gene)
In [29]:
#subset the genesNames by the transcript IDs in my red module 
geneList_bio <- factor(as.integer(geneNames_bio %in% myInterestingGenes_bio))
names(geneList_bio) <- geneNames_bio
head(geneList_bio)
NODE_100000_length_141_cov_1.05814_g92780_i0
0
NODE_100001_length_141_cov_1.03488_g92781_i0
0
NODE_100002_length_141_cov_0.325581_g92782_i0
0
NODE_100003_length_141_cov_0.0116279_g92783_i0
0
NODE_100004_length_141_cov_0_g92784_i0
0
NODE_100005_length_141_cov_0_g92785_i0
0
Levels:
  1. '0'
  2. '1'

Use topGO to identify enriched biological processes in the bioluminescent upper lip

In [30]:
#run the topGO function.
GOdata_bio <- new("topGOdata", ontology = "BP", allGenes = geneList_bio,
                    annot = annFUN.gene2GO, gene2GO = geneID2GO_bio)
Building most specific GOs .....

	( 12645 GO terms found. )


Build GO DAG topology ..........

	( 13427 GO terms and 31130 relations. )


Annotating nodes ...............

	( 15506 genes annotated to the GO terms. )

In [31]:
results_go_bio <- runTest(GOdata_bio, algorithm="weight01", statistic="Fisher")
			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 1881 nontrivial nodes
		 parameters: 
			 test statistic: fisher


	 Level 16:	1 nodes to be scored	(0 eliminated genes)


	 Level 15:	4 nodes to be scored	(0 eliminated genes)


	 Level 14:	8 nodes to be scored	(6 eliminated genes)


	 Level 13:	25 nodes to be scored	(115 eliminated genes)


	 Level 12:	56 nodes to be scored	(265 eliminated genes)


	 Level 11:	94 nodes to be scored	(1648 eliminated genes)


	 Level 10:	139 nodes to be scored	(3396 eliminated genes)


	 Level 9:	203 nodes to be scored	(4708 eliminated genes)


	 Level 8:	243 nodes to be scored	(5830 eliminated genes)


	 Level 7:	302 nodes to be scored	(8018 eliminated genes)


	 Level 6:	315 nodes to be scored	(11122 eliminated genes)


	 Level 5:	254 nodes to be scored	(12818 eliminated genes)


	 Level 4:	149 nodes to be scored	(14217 eliminated genes)


	 Level 3:	70 nodes to be scored	(14877 eliminated genes)


	 Level 2:	17 nodes to be scored	(15171 eliminated genes)


	 Level 1:	1 nodes to be scored	(15489 eliminated genes)

In [45]:
#retrieve the GO enrichment 
goEnrichment_bio   <- GenTable(GOdata_bio, Fisher = results_go_bio, orderBy = "Fisher", topNodes =100, numChar =1000 )
In [46]:
goEnrichment_bio$Fisher <- as.numeric(goEnrichment_bio$Fisher)
goEnrichment_bio <- goEnrichment_bio[goEnrichment_bio$Fisher < 0.05,] 
goEnrichment_bio <- goEnrichment_bio[goEnrichment_bio$Significant > 1,] 
#goEnrichment_bio <- goEnrichment_bio[,c("GO.ID","Term", "Annotated","Significant", "Expected", "Fisher")]
In [47]:
goEnrichment_bio
A data.frame: 48 × 6
GO.IDTermAnnotatedSignificantExpectedFisher
<chr><chr><int><int><dbl><dbl>
1GO:0006508proteolysis 15402513.413.400e-09
2GO:0008218bioluminescence 46 6 0.402.700e-06
3GO:0032504multicellular organism reproduction 96910 8.448.700e-06
4GO:0001519peptide amidation 19 4 0.171.900e-05
5GO:0006030chitin metabolic process 76 6 0.665.200e-05
6GO:0018401peptidyl-proline hydroxylation to 4-hydroxy-L-proline 25 4 0.226.000e-05
7GO:0032849positive regulation of cellular pH reduction 10 3 0.097.400e-05
8GO:0006556S-adenosylmethionine biosynthetic process 10 3 0.097.400e-05
9GO:2001225regulation of chloride transport 10 3 0.097.400e-05
10GO:0009620response to fungus 60 5 0.529.400e-05
11GO:0032230positive regulation of synaptic transmission, GABAergic 11 3 0.101.000e-04
12GO:0007586digestion 93 7 0.811.100e-04
13GO:0036378calcitriol biosynthetic process from calciol 3 2 0.032.200e-04
14GO:0010164response to cesium ion 3 2 0.032.200e-04
15GO:0048771tissue remodeling 76 6 0.663.100e-04
16GO:0042730fibrinolysis 17 3 0.154.000e-04
17GO:2001150positive regulation of dipeptide transmembrane transport 4 2 0.034.500e-04
18GO:0015670carbon dioxide transport 5 2 0.047.400e-04
19GO:0044719regulation of imaginal disc-derived wing size 49 4 0.438.600e-04
20GO:0038166angiotensin-activated signaling pathway 6 2 0.051.100e-03
21GO:0045780positive regulation of bone resorption 8 2 0.072.040e-03
22GO:0007218neuropeptide signaling pathway 32 3 0.282.660e-03
23GO:0010043response to zinc ion 41 3 0.365.400e-03
24GO:0001658branching involved in ureteric bud morphogenesis 13 2 0.115.510e-03
25GO:0045672positive regulation of osteoclast differentiation 14 2 0.126.390e-03
26GO:0006730one-carbon metabolic process 45 3 0.397.010e-03
29GO:0009405pathogenesis 20 2 0.171.290e-02
30GO:0071498cellular response to fluid shear stress 20 2 0.171.290e-02
31GO:0007613memory 109 4 0.951.514e-02
32GO:0030574collagen catabolic process 22 2 0.191.551e-02
33GO:0042475odontogenesis of dentin-containing tooth 23 2 0.201.689e-02
37GO:0015701bicarbonate transport 25 2 0.221.980e-02
38GO:0007585respiratory gaseous exchange 25 2 0.221.980e-02
39GO:0055085transmembrane transport 105015 9.142.183e-02
40GO:0055114oxidation-reduction process 100315 8.732.223e-02
41GO:0042738exogenous drug catabolic process 27 2 0.242.291e-02
42GO:0015771trehalose transport 27 2 0.242.291e-02
43GO:0046903secretion 755 8 6.572.302e-02
44GO:0003073regulation of systemic arterial blood pressure 28 2 0.242.453e-02
45GO:0042359vitamin D metabolic process 6 3 0.052.552e-02
54GO:0030206chondroitin sulfate biosynthetic process 31 2 0.272.967e-02
55GO:0010092specification of animal organ identity 21 2 0.183.416e-02
66GO:0043434response to peptide hormone 232 5 2.023.459e-02
67GO:0009268response to pH 37 2 0.324.109e-02
68GO:0006629lipid metabolic process 1154 910.054.271e-02
79GO:1902017regulation of cilium assembly 38 2 0.334.313e-02
80GO:0008543fibroblast growth factor receptor signaling pathway 38 2 0.334.313e-02
81GO:0002009morphogenesis of an epithelium 55813 4.864.514e-02
In [106]:
#write.table(goEnrichment_bio, "df_TopGO_Vargula_tsujii_DE_unique_BioUpperLip_BP.tsv",sep = "\t", quote=FALSE)
In [36]:
myterms_bio =goEnrichment_bio$GO.ID 
mygenes_bio = genesInTerm(GOdata_bio, myterms_bio)
In [ ]:
#extract the transcript ids for each GO term
for (i in 1:length(myterms_bio))
{
   myterm_bio <- myterms_bio[i]
   mygenesforterm_bio <- mygenes_bio[myterm_bio][[1]]
   myfactor_bio <- mygenesforterm_bio %in% myInterestingGenes_bio 
   mygenesforterm2_bio <- mygenesforterm_bio[myfactor_bio == TRUE]
   mygenesforterm2_bio <- paste(mygenesforterm2_bio, collapse=',')
   print(paste("Term",myterm_bio,"genes:",mygenesforterm2_bio))
}
In [51]:
ntop = 48
ggdata <- goEnrichment_bio[1:ntop,]
ggdata$Term <- factor(ggdata$Term, levels = rev(ggdata$Term)) 
plot_Bio_UL_BP <- ggplot(ggdata,
  aes(x = Term, y = -log10(Fisher), size = Significant, fill = -log10(Fisher))) +

 ylim(1,11) +
  geom_point(shape = 21) +
  scale_size(range = c(2.5,12.5)) +
  scale_fill_continuous(low = 'royalblue', high = 'red4') +

  labs(
    title = 'GO Analysis - Biological Process')+
 theme_bw(base_size = 24) +
labs(size= "Number of Genes")+
  theme(
    legend.position = 'right',
    legend.background = element_rect(),
    plot.title = element_text(angle = 0, size = 16, face = 'bold', vjust = 1),
    plot.subtitle = element_text(angle = 0, size = 10, face = 'bold', vjust = 1),
    plot.caption = element_text(angle = 0, size = 12, face = 'bold', vjust = 1),

   axis.text.x = element_blank(),
    axis.text.y = element_text(angle = 0, size = 13, face = 'bold', vjust = 0.5),
    #axis.title = element_text(size = 12, face = 'bold'),
    axis.title.x = element_blank(),
    #axis.title.y = element_text(size = 8, face = 'bold'),
    axis.line = element_line(colour = 'black'),

    #Legend
    legend.key = element_blank(), 
    legend.text = element_text(size = 14, face = "bold"), 
    title = element_text(size = 14, face = "bold")) +



  coord_flip()
#dev.off()
In [63]:
plot_Bio_UL_BP + labs(x = NULL)
In [62]:
options(repr.plot.width=10, repr.plot.height=8, repr.plot.res = 500)

Use topGO to identify enriched molecular functions in the bioluminescent upper lip

In [53]:
#run the topGO function.
GOdata_bio_MF <- new("topGOdata", ontology = "MF", allGenes = geneList_bio,
                    annot = annFUN.gene2GO, gene2GO = geneID2GO_bio)
Building most specific GOs .....

	( 3583 GO terms found. )


Build GO DAG topology ..........

	( 3618 GO terms and 4733 relations. )


Annotating nodes ...............

	( 16040 genes annotated to the GO terms. )

In [54]:
results_go_bio_MF <- runTest(GOdata_bio_MF, algorithm="weight01", statistic="Fisher")
			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 401 nontrivial nodes
		 parameters: 
			 test statistic: fisher


	 Level 12:	1 nodes to be scored	(0 eliminated genes)


	 Level 11:	2 nodes to be scored	(0 eliminated genes)


	 Level 10:	3 nodes to be scored	(7 eliminated genes)


	 Level 9:	11 nodes to be scored	(38 eliminated genes)


	 Level 8:	28 nodes to be scored	(58 eliminated genes)


	 Level 7:	54 nodes to be scored	(2776 eliminated genes)


	 Level 6:	79 nodes to be scored	(3524 eliminated genes)


	 Level 5:	92 nodes to be scored	(6256 eliminated genes)


	 Level 4:	88 nodes to be scored	(8602 eliminated genes)


	 Level 3:	33 nodes to be scored	(13147 eliminated genes)


	 Level 2:	9 nodes to be scored	(14316 eliminated genes)


	 Level 1:	1 nodes to be scored	(15865 eliminated genes)

In [55]:
#retrieve the GO enrichment 
goEnrichment_bio_MF   <- GenTable(GOdata_bio_MF, Fisher = results_go_bio_MF, orderBy = "Fisher", topNodes =100, numChar =1000 )
In [56]:
goEnrichment_bio_MF$Fisher <- as.numeric(goEnrichment_bio_MF$Fisher)
goEnrichment_bio_MF <- goEnrichment_bio_MF[goEnrichment_bio_MF$Fisher < 0.05,] 
goEnrichment_bio_MF <- goEnrichment_bio_MF[goEnrichment_bio_MF$Significant > 1,] 
goEnrichment_bio_MF <- goEnrichment_bio_MF[,c("GO.ID","Term", "Annotated","Significant", "Expected", "Fisher")]
In [57]:
goEnrichment_bio_MF
A data.frame: 29 × 6
GO.IDTermAnnotatedSignificantExpectedFisher
<chr><chr><int><int><dbl><dbl>
1GO:0004252serine-type endopeptidase activity 241132.280.0000005
2GO:0004222metalloendopeptidase activity 118 91.120.0000018
3GO:0004089carbonate dehydratase activity 26 50.250.0000040
4GO:0047712Cypridina-luciferin 2-monooxygenase activity 46 60.440.0000045
5GO:0004478methionine adenosyltransferase activity 7 30.070.0000280
6GO:0004598peptidylamidoglycolate lyase activity 20 40.190.0000330
7GO:0004504peptidylglycine monooxygenase activity 20 40.190.0000330
8GO:0008061chitin binding 119 71.130.0001400
9GO:0030343vitamin D3 25-hydroxylase activity 3 20.030.0002700
10GO:0004867serine-type endopeptidase inhibitor activity 95 60.900.0002800
11GO:0004181metallocarboxypeptidase activity 36 40.340.0003600
12GO:0016831carboxy-lyase activity 101 60.960.0004000
13GO:0005184neuropeptide hormone activity 5 30.050.0005200
14GO:0004574oligo-1,6-glucosidase activity 5 20.050.0008800
15GO:0004575sucrose alpha-glucosidase activity 5 20.050.0008800
16GO:0004656procollagen-proline 4-dioxygenase activity 28 30.270.0023000
17GO:0004064arylesterase activity 9 20.090.0030700
18GO:0005507copper ion binding 65 40.620.0033500
19GO:0046422violaxanthin de-epoxidase activity 10 20.090.0038200
20GO:0031418L-ascorbic acid binding 41 30.390.0068300
21GO:0005509calcium ion binding 640136.060.0078900
23GO:0080030methyl indole-3-acetate esterase activity 17 20.160.0110500
25GO:0015459potassium channel regulator activity 24 20.230.0214800
26GO:0016702oxidoreductase activity, acting on single donors with incorporation of molecular oxygen, incorporation of two atoms of oxygen 63 30.600.0219000
27GO:0015151alpha-glucoside transmembrane transporter activity 26 20.250.0249900
28GO:0015574trehalose transmembrane transporter activity 26 20.250.0249900
29GO:0018833DDT-dehydrochlorinase activity 26 20.250.0249900
34GO:0005506iron ion binding 265 62.510.0405900
35GO:0008395steroid hydroxylase activity 34 20.320.0410700
In [60]:
ntop = 29
ggdata <- goEnrichment_bio_MF[1:ntop,]
ggdata$Term <- factor(ggdata$Term, levels = rev(ggdata$Term)) 
plot_Bio_UL_MF <- ggplot(ggdata,
  aes(x = Term, y = -log10(Fisher), size = Significant, fill = -log10(Fisher))) +

 ylim(1,11) +
  geom_point(shape = 21) +
  scale_size(range = c(2.5,12.5)) +
  scale_fill_continuous(low = 'royalblue', high = 'red4') +

  labs(
    title = 'GO Analysis - Molecular Function')+
 theme_bw(base_size = 24) +
labs(size= "Number of Genes")+
  theme(
    legend.position = 'right',
    legend.background = element_rect(),
    plot.title = element_text(angle = 0, size = 16, face = 'bold', vjust = 1),
    plot.subtitle = element_text(angle = 0, size = 10, face = 'bold', vjust = 1),
    plot.caption = element_text(angle = 0, size = 12, face = 'bold', vjust = 1),

   axis.text.x = element_blank(),
    axis.text.y = element_text(angle = 0, size = 13, face = 'bold', vjust = 0.5),
    #axis.title = element_text(size = 12, face = 'bold'),
    axis.title.x = element_blank(),
    #axis.title.y = element_text(size = 8, face = 'bold'),
    axis.line = element_line(colour = 'black'),

    #Legend
    legend.key = element_blank(), 
    legend.text = element_text(size = 14, face = "bold"), 
    title = element_text(size = 14, face = "bold")) +



  coord_flip()
#dev.off()
In [70]:
plot_Bio_UL_MF + labs(x=NULL)
In [69]:
options(repr.plot.width=16, repr.plot.height=8, repr.plot.res = 500)

References

P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008)

M. I. Love, W. Huber, S. Anders, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014)

Alexa A, Rahnenfuhrer J, topGO: Enrichment Analysis for Gene Ontology (2024)

Shannon, Paul et al., Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research vol. 13,11 (2003)

In [ ]: